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, Abstract. We develop a practical and novel method for inference on intersection 

OA I bounds, namely bounds defined by either the infimum or supremum of a parametric 

or nonparametric function, or equivalently, the value of a linear programming problem 
with a potentially infinite constraint set. Our approach is especially convenient in mod- 
. els comprised of a continuum of inequalities that are separable in parameters, and also 

applies to models with inequalities that are non-separable in parameters. Since analog 
estimators for intersection bounds can be severely biased in finite samples, routinely 
underestimating the length of the identified set, we also offer a (downward/upward) me- 
dian unbiased estimator of these (upper/lower) bounds as a natural by-product of our 
inferential procedure. Furthermore, our method appears to be the first and currently 
only method for inference in nonparametric models with a continuum of inequalities. 
We develop asymptotic theory for our method based on the strong approximation of 
a sequence of studentized empirical processes by a sequence of Gaussian or other piv- 
^ ' otal processes. We provide conditions for the use of nonparametric kernel and series 

. estimators, including a novel result that establishes strong approximation for general 

ly-^ , series estimators, which may be of independent interest. We illustrate the usefulness 

CO I of our method with Monte Carlo experiments and an empirical example. 
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1. Introduction 

This paper develops a practical and novel method for estimation and inference on pa- 
rameters restricted by intersection bounds. These are settings where the true parameter 
value, say 0* , is known to lie within the bounds {y) ,6'" (f)] for each value f in a 
possibly infinite set V. The identification region for 9* is then 

e, = n,ev [e' {v) , {v)] = [sup^^v {v) , inf^^v ^" (v)] . (1.1) 

Intersection bounds arise naturally from exclusion restrictions (Manski (2003)) and ap- 
pear in numerous applied and theoretical examples|l| This paper covers both paramet- 
ric and non-parametric estimators of the bound-generating functions v Ouiv) and 
V (— > 6i{v), and also covers cases where the constraint set V is a continuum. Thus, this 
paper improves upon prior approaches, which only treat finite constraint sets and para- 
metric estimation of bound-generating functions. More generally, the methods of this 
paper apply to any estimator for the value of a linear programming problem with an 
infinite dimensional constraint set. 

This paper overcomes significant complications for estimation and inference in such 
contexts. First, since sample analogs of the lower and upper bounds of 6/ are the 
suprema and infima of estimated bound-generating functions, they have substantial fi- 
nite sample bias, and the estimated bounds tend to be much tighter than the population 
bounds. This has been noted by Manski and Pepper (2000, 2008), and some heuris- 
tic bias adjustments have been proposed by Haile and Tamer (2003) and Kreider and 
Pepper (2007). Second, the fact that the boundary estimates are suprema and infima 
of parametric or nonparametric empirical processes typically renders closed-form char- 
acterization of their asymptotic distributions unavailable or difficult to establish. As a 
consequence, researchers have typically used the canonical bootstrap for inference. Yet 
results from the recent literature indicate that the canonical bootstrap is not generally 
consistent in such settings, see e.g. Andrews and Han (2009), Bugni (2009), and Canay 
(2009). 



^Examples include monotone instrumental variables and the returns to schooling (Manski and Pepper 
(2000)), English auctions (Haile and Tamer (2003)), the returns to language skills (Gonzalez (2005)), 
set identification with Tobin regressors (Chernozhukov, Rigobon, and Stoker (2007)), endogeneity with 
discrete outcomes (Chesher (2007)), changes in the distribution of wages (Blundell, Gosling, Ichimura, 
and Meghir (2007)), the study of disability and employment (Kreider and Pepper (2007)), estimation of 
income poverty measures (Nicoletti, Foliano, and Peracchi (2007)), unemployment compensation reform 
(Lee and Wilke (2009)), bounds on the distribution of treatment effects under strong ignorability (Fan 
(2009)), and set identification with imperfect instruments (Nevo and Rosen (2008)). 
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We solve the problem of estimation and inference for intersection bounds by proposing 
(downward or upward) median unbiased estimators of the upper and lower bounds, as 
well as confidence intervals. Specifically, our approach employs a precision-correction to 
the estimated bound- generating functions v ^ 9 (v) and v 9^ (v) before applying the 
supremum and infimum operators. Indeed, we adjust the estimated bound-generating 
functions for their precision by adding to each of them an appropriate critical value times 
their pointwise standard error. Then, depending on the choice of the critical value, the 
intersection of these precision- adjusted bounds provides (i) a downward median unbiased 
estimator for the upper bound inft,gv ) and an upward median unbiased estimator 
for the lower bound sup^gy6';(f) and (ii) confidence sets for either the identified set 
B/ or the true parameter value 9*^ We select the critical value either analytically 
or via simulation of an approximating Gaussian process. Our method applies in both 
parametric and non-parametric settings. For both cases we provide formal justification 
via asymptotic theory based on the strong approximation of a sequence of studentized 
empirical processes by a sequence of Gaussian or other pivotal processes. This includes 
an important new result on strong approximation for series estimators that applies to any 
estimator that admits a linear approximation, essentially providing a functional central 
limit theorem for series estimators for the first time in the literature. In principle this 
functional central limit theorem covers linear and non-linear series estimators, both with 
and without endogeneity. 

This paper contributes to a growing literature on inference on set-identified param- 
eters bounded by inequality restrictions. The prior literature has focused primarily on 
models with a finite number of unconditional inequality restrictions. Some examples 
include Andrews and Jia (2008), Beresteanu and Molinari (2008), Chernozhukov, Hong, 
and Tamer (2007), Galichon and Henry (2009), Romano and Shaikh (2008), Romano 
and Shaikh (2009), and Rosen (2008), among others. To the best of our knowledge, our 
paper is the first to consider inference with a continuum of inequalities, which includes 
conditional moment inequalities as a particular case but also covers other examples such 
as conditional quantile inequalities. Recent papers (some in progress) on conditional 
moment inequalities, written independently and contemporaneously, include Andrews 
and Shi (2009), Fan (2009), Kim (2009), and Menzel (2009), and all employ different 



We say an estimator is downward (upward) median unbiased if the probability it lies below (above) 
its target value is less (greater) than or equal to one half asymptotically. Achieving exact median 
unbiasedness is not possible in full generality. 
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approaches! Our approach is especially convenient for performing inference in para- 
metric and non-parametric models with a continuum of inequalities that are separable 
in parameters, and it also applies to inference in models with inequalities that are non- 
separable in parameters. Furthermore, our method appears to be the first and currently 
only method available for performing inference with fully nonparametric inequality re- 
strictions. An attractive feature of our approach is that in addition to providing a valid 
method of inference, we provide a novel construction for (downward or upward) me- 
dian unbiased estimators for (upper or lower) intersection bounds. In fact, the only 
difference in the construction of our estimators and confidence intervals is the choice of 
critical value, which is a quantile of an appropriate approximating distribution. Thus, 
practitioners need not implement two entirely different methods to construct estimators 
and confidence bands with desirable properties. 

We organize the paper as follows. In section 2, we motivate the analysis with examples 
and provide an informal overview of our results. In section 3 we provide a formal treat- 
ment of our method, providing conditions and theorems for validity in both parametric 
and nonparametric contexts. In 4 we provide a Monte Carlo study, and in section 5 
we give an empirical example. In section 6 we conclude. In the Appendix we provide 
proofs, establish strong approximation results for both series and kernel estimators, and 
describe the steps required to implement our method in practice. 

2. Motivating Examples and Informal Overview of Results 

In this section we briefly describe four examples of intersection bounds from the lit- 
erature and provide an informal overview of our results. 

Example 1: Treatment Effects and Instrumental Variables. In the analysis of 
treatment response, the ability to uniquely identify the distribution of potential outcomes 
is typically lacking without either experimental data or strong assumptions. This owes to 
the fact that for each individual unit of observation, only the outcome from the received 
treatment is observed; the counterfactual outcome that would have occurred given a 

^Some approaches, such as Andrews and Shi (2009), rely on Bierens type integrated moment tests and 
some, such as Menzel (2009), on standard tests with finite inequahties, using an increasing number 
of inequahties, both of which differ from the approach pursued here. Using goodness-of-fit tests as a 
simple analogy, our approach is most similar to Kolmogorov-Smirnov type tests, whereas the approach 
in Andrews and Shi (2009) appears similar to Bierens type tests, and Menzel (2009) 's approach appears 
similar to Pearson type tests. Just as in the goodness-of-fit literature, none of the approaches are likely 
to universally dominate others since there are no uniformly most powerful tests in complex settings 
such as the one considered here. 
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different treatment is not known. Though we focus here on treatment effects, similar 
issues are present in other areas of economics. In the analysis of markets, for example, 
observed equilibrium outcomes reveal quantity demanded at the observed price, but do 
not reveal what demand would have been given other prices. 

To illustrate how bounds on treatment effects fit into our framework, first suppose 
only that the support of the outcome space is known, but no other assumptions are made 
regarding the distribution of counterfactual outcomes. Then Manski (1989) and Manski 
(1990) provide worst-case bounds on mean treatment outcomes for any treatment t con- 
ditional on covariates w, LBwc{w,t) < E[Y (t) \w] < UByjc{w,t). These bounds are 
conditional expectations of observed random variables, and are thus trivially intersection 
bounds where the intersection set is singleton. \i w = (x, v) and v is an instrumental 
variable satisfying E\Y {t)\x^v\ = E\Y {t) \x], then the sharp bounds on E [Y (t) |x] are 
LBi^{x,t) < E[Y(t)\x] < UBiy {x,t), where LBiy{x,t) = snp^^y LB^di^jV) ,t) and 
UBiv {x,t) = iniy^v UB^c{{x,v) ,t). In this case the identified set is the intersection 
over the support of the instrument v of the worst-case bounds at w = {x,v). Similarly, 
bounds implied by restrictions such as monotone treatment response, monotone treat- 
ment selection, and monotone instrumental variables, as in Manski (1997) and Manski 
and Pepper (2000), also take the form of intersection bounds. In particular, the returns 
to schooling application of section [5] considers estimation and inference on intersection 
bounds implied by joint monotone treatment selection and monotone instrumental vari- 
able restrictions. □ 



Example 2: Bounding Distributions to Account for Selection. Similar analy- 
sis to that of Manski (1994) and Manski and Pepper (2000) can be applied generally 
to inference on distributions whose observations are censored due to selection. Such 
an approach is employed by Blundell, Gosling, Ichimura, and Meghir (2007) to study 
changes in male and female wages, while accounting for the censoring of the wage dis- 
tribution incurred by selection into employment. The starting point of their analysis 
is that the cumulative distribution of wages at any point w, conditional on covariates x 
must satisfy the worst case bounds 

F {w\x, E=l)P{x) <F {w\x) < F {w\x, E = 1) P (x) + 1 - P (x) 

where E is an indicator of employment, and P (x) = Pt {E = l\x). This relation is then 
used to bound quantiles of the distribution of wages conditional on covariates. The 
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worst-case bounds are often not very informative, so additional restrictions motivated 
by economic theory are used to tighten the bounds. 

One such restriction is an exclusion restriction of the continuous variable out-of-work 
income, z, see Blundell, Goshng, Ichimura, and Meghir (2007, pp. 331-333). Two such 
possibilities are considered: the use of z as an excluded instrument, and the use of z as 
a monotone instrument. The former restriction implies 

v[ia.'ic{F {w\x, z, E = 1) P {x, z)} < F {w\x) 

z 

< min {F {w\x, z, E = 1) P {x, z) + 1 — P {x, z)} , 

z 

while the weaker monotonicity restriction implies that for any z^ on the support of Z, 
max{F (wlx, ^, £^ = 1) P (x, 2;)} < F{w\x^Zq) 

z>zo 

< min {F {w\x, z, E ^ 1) P {x, z) + 1 - P {x, z)} . 

z<zo 

□ 



Example 3: English Auctions. Invoking two weak assumptions on bidder behavior 
in an independent private values paradigm, Haile and Tamer (2003) use the distribution 
of observed bids to formulate bounds on the distribution of bidders' valuations. The two 
assumptions on bidder behavior, which nest various equilibria, are that each bidder's bid 
is no greater than her valuation, and that bidders who did not win would not have been 
willing to pay more than the winning bid. Theorems 1 and 2 of Haile and Tamer (2003, 
pp. 7-10) give the following implied bounds on the cumulative distribution of valuations 
at any point v, 

max (j) (G^.„ {v)]n-l,n) < F (v) < min {Gi-.n (v) ]i,n) , 

2<n<M ■ 2<n<M, l<i<n 

where M is the number of potential bidders in an auction, and n is the number who 
actually submit bids. Here, Gi:n denotes the distribution of the i*'^ order statistic of 
bids, and (l){-;i,n) is a monotone transformation relating any parent distribution F to 
the distribution of its i^^ order statistic, i.e. 

F{v) = <P{F,.,n {v);t,n). 

denotes the distribution of the n*^ order statistic of bids, plus minimum bid incre- 
ment A, in an auction of n bidders. The derived bounds fall into the present framework, 
as the distributions (•) and Gi:n (•) are identified and consistently estimable. 
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Example 4: Conditional Moment Inequalities. Our inferential method can also 
be used to conduct pointwise inference on parameters in models comprised of conditional 
moment inequalities. This can be done whether the conditioning variables are discrete or 
continuous. Such restrictions arise naturally in empirical work in industrial organization 
and in particular in models of oligopoly entry, see for example Pakes, Porter, Ho, and 
Ishii (2005) and Berry and Tamer (2007). 

To illustrate, consider the restriction 

E [m (a;, 70) \v\ > for every t> G V, (2.1) 

where m(-,-) is a real-valued function, (x, f) are random variables observable by the 
econometrician, and 70 is the parameter of interest. For example, in a model of oligopoly 
entry 70 could measure the effect of one firm's entry decision on a rival's profit. It may 
be of interest to test whether 70 is equal to some conjectured value 7, e.g. 7 = 0. To see 
how our framework can be used to test this hypothesis, define 6* (7, f ) := E [m (x, 7) \v] 
and 6 (7, v) a consistent estimator. Suppose that we would like to test (12.11) at level a 
for the conjectured parameter value 70 = 7 against an unrestricted alternative. Under 
some continuity conditions this is equivalent to the test of 

inf 6 (7, f ) > against inf 9 (7, v) < 0. 
Let 9o (7) := inf„gv6' (7,f). Our method for inference delivers a statistic 

dail) = inf ^(7, v) + k-s{-f, v) 

such that \imn-*oo P{9o (7) > ^0(7)) < «• Here, 5(7, f) is the standard error of 0{'y,v) 
and is a critical value, which will be described below. If ^0(7) < 0, then we reject the 
null hypothesis, while if 9a{"y) > 0, then we do not reject. This provides a method for 
pointwise inference on 79. □ 



Informal Overview of Results. We now provide an informal description of our 
method for estimation and inference. Let 6* denote the parameter of interest. Con- 
sider an upper bound 60 on 6* of the form 

e* <eo:= mie{v), (2.2) 

where v ^ 6{v) is a bound-generating function, and V is the set over which the minimum 
is taken. Likewise, there could be lower bounds defined symmetrically. Since our method 
covers lower bounds in an analogous way, we focus on describing our method for (12. 2p . 
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We base estimation and inference on a uniformly consistent estimator {0{v), f G V} of 
the bound-generating function, which could be parametric or nonparametric. 

What are good estimators and confidence regions for the bound Oq"^ The first and per- 
haps simplest idea is to base estimation and inference on the sample analog: inf^jgy ^(^)- 
However, this estimator does not perform well in practice. First, the sample analog 
estimator tends to be downward (optimistically) biased in finite samples. Second, and 
perhaps more importantly, unequal sampling error of the estimator 6{v) across v can 
overwhelm inference in finite samples. Indeed, different levels of precision of 6{v) at dif- 
ferent points can severely distort the perception of the minimum of the bound-generating 
function 0{v). Figure [1] illustrates these problems geometrically. The solid curve is the 
true bound-generating function v hh^ 9{v), and the dash-dotted thick curve is its estimate 
V H-i> 6{v). The remaining dashed curves represent eight additional potential realizations 
of the estimator, illustrating the precision of the estimator. In particular, we see that 
the precision of the estimator is much lower on the right side than on the left. A naive 
sample analog estimate for 9o is provided by the minimum of the dash-dotted curve, but 
this estimate can in fact be quite far away from Oq. This large deviation from the true 
value arises from both the lower precision of the estimated curve on the right side of 
the figure and from the downward bias created by taking the minimum of the estimated 
curve. 

To overcome these problems, we propose a precision- corrected estimate of No- 



where s{v) is the standard error of 0{v), \^ is a data-dependent set that converges in 
probability to a non-stochastic set V that contains Vq := aigmm^^y 9 (v), and A; is a 
critical value, whose construction we describe below. That is, our estimator 6 minimizes 
the precision- corrected curve given by 6{v) plus critical value k times the pointwise 
standard error s{v). Figure [2] shows a precision-corrected curve as a dashed curve with 
a particular choice of critical value k. In this figure, we see that the minimizer of 
the precision-corrected curve can indeed be much closer to than the sample analog 
inf^^y6{v). Although this illustration is schematic in nature, it conveys geometrically 
why our approach can remove the downward bias. In what follows, we provide both 
theoretical and Monte-Carlo evidence that further supports this point. 




(2.3) 
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Let us now discuss the choice of the critical value k. Ideally, we would choose k in 
(I2.3P as a quantile of the supremum of the normalized stochastic process 



ZJv) 



e {v) - e{ v) 



V eV c 



In particular, for the purpose of estimation of 6q, we would like to set 

e{v)-e{v) 



k = Median 



sup 



s{v) 



(2.4) 



(2.5) 



which gives us a downward median- unbiased estimate 6 oi 6q. For the purpose of infer- 
ence on 9o, we would like to set 



k = {1 — Q;)-Quantile 



sup 



(2.6) 



which gives us a one-sided (1 — a) confidence region (— oo,^^] for 6*0. Of course, these 
values of k are unknown in practice, and we have to replace them with suitable estimates. 

We estimate critical values as follows. Generally, the finite-sample distribution of the 
process Z„ = {Zn{v) : f G V} is unknown, but we can approximate it uniformly by a 
sequence of processes with a known (or at least estimable) distribution. Indeed, we can 
approximate Z„ uniformly by a sequence of processes Z^, which are zero-mean Gaussian 
or other pivotal processes with a known distribution, that is. 



a„sup \Zniv) - Z'^{v) 



for some sequence of constants a„. Once we have Z^, we consider the variable 

Sniy) = a„[sup Z'^{v) - bn] 



(2.7) 



(2.8) 



for some sequences of constants a„ and bn- Then we obtain the estimates of the p-th 
quantile of SniV), denoted by c{p), by one of two methods: 



1. Simulation Method, where we simulate the Gaussian process Z'^{v) and compute 
its quantiles numerically. 

2. Analytical Method, where we use limit quantiles or approximate quantiles of 
Sn{V), which we derive by limit arguments or Hotelling's tube method for the 
suprema of Gaussian processes. 
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Finally, we then set the critical value k := bn + c(j9)/a„, where a„ and b„ consistently 
estimate and bn, respectively, and where p = 1/2 for estimation and p = 1 — a for 
inference. 

At an abstract level our method does not distinguish parametric estimators of 6{v) 
from nonparametric estimators; however, details of the analysis and regularity conditions 
are quite distinct. Specifically, in section 3, we divide the analysis into Donsker and non- 
Donsker cases, corresponding approximately to parametric and non-parametric cases. In 
both cases, we employ strong approximation analysis to approximate the quantiles of 
Sn{V), and we verify our main conditions separately for each case. 

An important input into our procedure is the choice of the estimator V" of Vq, the 
argmin set of the true bound-generating function. We describe a specific choice of 
such an estimator in Section 3. At a general level we require V to be bigger than 
(to include) Vq, with probability approaching one; at the same time, we require this 
estimate not to be too much bigger than VqE The first requirement guarantees that we 
are not performing overly optimistic inference, and the second requirement guarantees 
that were are not performing overly pessimistic inference. Indeed, from (12.51) and (12.61) 
we see that the critical value k is decreasing in the size of the set V, so that smaller 
V leads to a lower (less conservative) k. Lower k in turn leads to point estimates 
with a less conservative bias-correction, and less conservative confidence intervals. A 
good estimator V is therefore essential. We illustrate the gains that can be made from 
estimating the argmin set Vq in Figures 2 and 3. In Figure 2, we depict a precision- 
corrected curve (dashed curve) that adjusts the boundary estimate 6{v) (dotted curve) 
by an amount proportional to its point- wise standard error using the conservative choice 

= V = [0, 1]. In Figure 3, we depict the same initial precision-corrected curve and 
also a two-step precision-corrected curve (dash-dotted cruve) that adjusts the boundary 
estimate 6{v) (dotted curve) by an amount proportional to its point-wise standard error 
using a critical value that was computed using an estimate V of Vq, which is much less 
conservative than using the entire set V = [0, 1]. The gain from estimating the argmin 
set Vo here is that the minimum of this precision-corrected curve is now much closer to 
the true minimum of the bound-generating function 6{v) than the minimum of the 
initial precision-corrected curve. 



Of course, an ideal but infeasible choice of V would be to simply use Vq. 
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3. Theory of Inference on Intersection Bounds 

3.1. Theory under High-Level Conditions. We begin by presenting a set of simple 
high-level conditions, under which we demonstrate validity and general applicability of 
our inferential approach. In subsequent sections we verify these conditions for parametric 
and nonparametric estimators of the bound-generating function 6{v). 

In the conditions that follow, the studentized stochastic process defined in (12. 4p plays 
a particularly important role. Moreover, we also employ a general superset estimate V 
consistent for the argmin superset V, which is a set that contains the argmin set 

Vo = arg inf 9{v), 

that is Vo C V. We require that the superset estimate V be consistent for the superset 
V with respect to the Hausdorff distance, i.e. 

dniV, V) := max{sup d{v, V), sup d{v, V)} -^p 0, 

where d{v, V) = inf„/gy ||t> — 1>'||. While it is generally desirable for the set V to be small, 
we shall see later that working with supersets V of the argmin set Vq, rather than with 
the argmin set itself, turns out to be essential in non-parametric settings. 

We are now prepared to state the following conditions on the studentized stochastic 
process and estimators of the superset §| 

Condition C. 1. Let V be a superset of Vq, that is, Vq V. For some sequence of 
nonnegative normalizing constants an and bn, we have that the normalized supremum 
of the studentized process an ■ (sup^g^ Z„(t>) — 6„) can either be (a) approximated in 
distribution by a variable SooiV), namely 

an ■ (sup Zniv) - bn) =d £oo{V) + Op(l), 

vev 

or (b) approximately majorized in distribution by a variable Soo{V), namely 

an ■ (sup Zniv) - bn) <d £ooiV) + Op(l), 

^The notation used in Condition C.l is as follows: for a sequence of random variables Xn and a random 
variable X , we use Xn =d X + Op{\) to denote that there exist a sequence of random variables Xn and 
a random variable X on the same probability space satisfying X„ Xn for each n, X =^ X, and 
Xn X, where X X denotes that the distribution of X is the same as that of that of X. Similarly, 
we use Xn l£dYn + Op{l) to mean that there exist X„ and y„ on the same probability space satisfying 
Xn <d Xn, Yn =d Yn for cach n, and Xn ~ Yn — >p 0, where X <d X denotes that the distribution of X 
is first-order stochastically dominated by that of X. 
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where ScaiV) has a known continuous distribution function. 

This is a basic asymptotic condition, which either requires standard convergence in 
distribution or majorization in distribution by a hmit random variable with a continuous 
distribution function. We also consider the following generalization of C.l which is useful 
for our purposes. 

Condition C*. 1. Let V be a superset oJVq, that is, Vq C.V. For some sequence of 
nonnegative normalizing constants an and bn, we have that the normalized supremum 
of the studentized process a„ • (sup^gy — bn) can either be (a) approximated in 

distribution by a variable Sn{V), namely 

an ■ (sup Zn{v) - bn) =d £n{V) + Op(l), 

or (b) approximately majorized in distribution by a variable Sn{V), namely 

an • (sup Zn{v) - bn) <d ^n(^) + Op{l), 

where Sn{V) = Op{l) has a known distribution and satisfies a sequential continuity or 
anti- concentration property, specifically that for any sequence e„ \ 0, 

snp P[\£n{V) - x\ < en] ^ 0. (3.1) 

Conditions C.l or C*.l justify the use of quantiles oi Soo{V) or Sn{V), respectively, for 
inference. Condition C.l(a) requires that the supremum of the normalized process Zn{v), 
appropriately studentized, converges in distribution to the random variable SodV)- As 
shown in section 3, it applies with either parametric or nonpar ametric kernel estimation 
of the bound-generating function 9{-). Condition C.l(b) is a weaker condition that 
does not require the studentized supremum of Zn{v) to have an asymptotic distribution, 
but only requires that its distribution can be majorized by that of £oo{V). Section 
3.4 establishes its validity for nonparametric series estimation of the bound-generating 
function. Note that by the term "known distribution," in reference to SooiVo) and SniV), 
we mean a distribution whose parameters can be estimated consistently. Also, instead of 
using standard convergence in distribution notation, we employ strong approximation, 
which is without loss of generality relative to the former due to the Skorohod-Dudley- 
Wichura construction. In general, the normalizing constants a„ and 6„ may depend on 
V and can be different depending on which of C.l (a) and C.l(b) hold. 
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Condition C*.l is a generalization of C.l, which allows for the use of some intermediate 
or penultimate approximations for inference. For example, in the case of series approxi- 
mation we can approximate the supremum of the process Zn{v) by the supremum Sn{V) 
of a Gausssian process, which does not in general converge to a fixed random variable, 
but can instead be majorized in distribution by an exponential random variable Soo{V). 
However, this majorization can be conservative. We can instead use the quantiles of 
Sn{V) for inference, which in our experience provides a more accurate, less conserva- 
tive approximation. In order for the penultimate approach to be valid, we require the 
sequential continuity, or anti-concentration, property (13. ip for the sequence of random 
variables SniV). This property is needed for the disappearance of the effect of approxi- 
mation errors in critical values on the coverage probabilities. If Sn{V) has a continuous 
limit distribution the anti-concentration property follows automatically. If SniV) does 
not have a limit distribution, verification of this property is a harder problem, which can 
be achieved either numerically or, in some limited cases, analytically using exact versions 
of Hotelling's tubing method. Analytical limitations arise because little is known about 
the anti-concentration properties of the suprema of a sequence of Gaussian processes, 
in contrast to a vast knowledge on the concentration properties of such processes (see 
however Rudelson and Vershynin (2007), Rudelson and Vershynin (2008) and Tao and 
Vu (2009) for a discussion of anti-concentration inequalities for some "simpler" related 
problems) . 

The next condition deals with the effect of estimating the approximate argmin sets. 

Condition C. 2. Let V denote any sequence of sets, possibly data-dependent, that con- 
tain a superset V of Vq, with probability approaching one, and that converge to V at 
the rate r^, i.e., duiy ,V) < Op{rn), where r„ is a sequence of constants converging 
to zero. Also, let an and bn denote corresponding, possibly data-dependent, normalizing 
constants. Then the normalized supremum of the studentized stochastic process is insen- 
sitive to the replacement of the superset V and normalizing constants {an, bn) with the 
estimates V and {an,bn), namely 

an ■ (sup Zn{v) - bn) - an " (sUp Zn{v) - bn) 0. 

This assumption allows for a data- dependent choice of V , but requires that V should 
eventually settle down at V , without affecting the supremum of the studentized sto- 
chastic process. In Section 3.6, we construct such estimators from the level sets of the 
estimated bound-generating function v t— > 6{v) and show that these estimators converge 
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to the level sets oi v ^ 9{v) at a. rate sufficiently fast not to affect the behavior of 
the supremum of the estimated process. In nonparametric settings, this may sometimes 
require that level sets are strictly larger than the argmin set Vq. 

We now state our first main result under the above conditions. 
Theorem 1 (Main Result Under C.1-C.2.). Let 

Op = mi[e{v) + \bn + cip)/an]s{v)], 

where c{p) is defined below. 

1. Suppose that conditions C.l(b) or C*.l(b) and C.2 hold and thatc{p) is a consistent 
upper bound on Cn{p) := the p — th quantile of SniY), where n = oo under C.l(h), 
namely 

c{p) > Cn{p) +Op(l). 
Then we have that the estimator Op is downward p-quantile unbiased, namely 

liminf P[^o < Op] >p. 

n— >cx3 

2. Suppose conditions C.l(a) or CT .1(a) and C.2 hold with V — Vq and that c{p) is 
a consistent estimate of Cn{p) :— p-th quantile of SniVo), where n — oo under C.l(a), 
namely 

c(p) = Cn(p) + Op(l). 

Then we have that the estimator Op is p-quantile unbiased, namely 

lim P[0o < Op] = p. 

n—*oo 

Thus, the quantity Op can be used to provide a one-sided confidence interval for ^o, 
since hm„_,ooP[^o < ^p] > P: with equality under C.l(a) or C*.l(a). Moreover, O1/2 is a 
median downward-unbiased estimator for Oq in the sense that 

lim P[0o < O1/2] > I- 

In words, the asymptotic probability that the estimator O1/2 lies above the true Oq is at 
least a half. 

3.2. Donsker and Non-Donsker Cases. We specialize the high-level conditions de- 
veloped above into two general cases: 

(1) The Donsker case, where the studentized process converges to a fixed continuous 
Gaussian process. This immediately implies the convergence of suprema as well as 
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the insensitivity of the supremum to replacement of the argmin sets with consistent 
estimates. This case primarily covers parametric estimation and includes a great variety 
of practical procedures, including the "finite-support case", where V is a finite set. 

(2) The non-Donsker case, where the studentized process does not converge to a 
fixed continuous Gaussian process, but may instead be approximated by a sequence of 
Gaussian processes or other pivotal processes. This case is harder, but it also leads to a 
majorization of the supremum by tractable random variables as well as insensitivity to 
replacement of the argmin supersets with consistent estimates. This case primarily covers 
nonparametric estimation of the boundary and includes a rich variety of procedures, 
ranging from kernel to series methods. 

Formally we define the Donsker case as follows. 

Condition D. 1. The normalized stochastic process Zn converges to a continuous Gauss- 
ian process with a known distribution and a non-degenerate covariance function, in 
the space of hounded functions on V, namely 

Zn{-)=dZM+Op{l), mr{V). 

It is worth noting here that given weak convergence, convergence in probability is 
without loss of generahty due to the Skorohod-Dudley-Wichura construction. According 
to the latter, given weak convergence, we can always find a suitably enriched probability 
space on which convergence in probability takes place. 

The Donsker condition is widely applicable in parametric and semi-parametric esti- 
mation problems. It leads to immediate verification of the high-level conditions C.l and 
C.2. 

Lemma 1. The Donsker condition D.l implies conditions C.l (a) with normalizing 
constants an — an — ^ o-nd bn — bn — and the limit variable Soc{Vo) — sup^^y^ Zoo{v) 
with a continuous distribution and condition C.2, including the ideal case V — Vq, with 
any vanishing sequence of positive constants rn — o{l). 

Next, we formally define the non-Donsker cases as follows. 

Condition N. The normalized stochastic process Z^ can be approximated uniformly by 
a sequence of penultimate processes Z'^, which is a sequence of either Gaussian processes 
or some other pivotal processes Z'^, with a known distribution, namely 

On sup \Zn{v) - Z'^{v) \ = Op(l), 
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for some sequence of constants an- Conditions C.l and C.2 or CI and C.2 hold with 
Zn{v) replaced by the sequence of penultimate processes Z'^{v). The resulting conditions 
are referred to as Conditions N.l, N* .1 and N.2, respectively. 

This condition requires the studentized stochastic process to be approximated by a 
sequence of pivotal processes, whose behavior is sufficiently regular to allow verification 
of the high-level conditions C.l and C.2. Below, we show how this condition is fulfilled 
for series and kernel estimators. 

Lemma 2. Condition N implies C.l (or C.l) and C.2. 

3.3. Parametric Estimation of f i-^ ^('^)- In this subsection, we show that the con- 
ditions developed above apply to various parametric estimation methods of v d{v). 
Parametric estimation is an important practical case, and it turns out to be quite 
tractable. In particular, it includes the case where the set V is finite. We formally 
state the conditions required for parametric estimation in the following: 

Condition P. 9{v) = 6{v,^o), where 6{v,^) is a known function of finite- dimensional 
parameter 7 G M.'' , and d9{v,'j)/d'j is uniformly continuous in {■j,v) for all ■y in a 
neighborhood of'~fo,v G V. 

P.l An estimate 7 is available such that 

v^(7-7o)=d^^'/W + Op(l), J\f^dN{0,I), 

where for 

9{v 



97 

the norm \\g{v)\\ is bounded uniformly in v above and away from zero. 
P.2 There is an estimator for the standard deviation of9{v,^) that satisfies 



uniformly in v G V. For example, if there is an estimate Q such that Q = 
Q-\-Op{l), then such an estimate of precision is given by s{v) — \\g{v)\\/y/n with 



For the case of a finite number of support points, one can set 9{v) = Ylj=i Ij^i"^ — "^j); 
where {vi, . . . ,Vj) are the support points and l(-) is the usual indicator function. The 
following lemma shows that Condition D follows under the conditions stated above. 
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Lemma 3. Condition P implies Condition D with the limit process 

3.4. Nonparametric Estimation of Q{y) via Series. Scries estimation is effectively 
like parametric estimation, but the dimension of the estimated parameter tends to in- 
finity and bias arises due to approximation based on a finite number of basis functions. 
If we select the number of terms in the series expansion so that the estimation error is 
of larger magnitude than the approximation error, then the analysis closely mimics that 
of the parametric case. 

Condition S. Suppose that the series estimator 9{v) for the function 9{v) has the form 

where Pn{v) '■= {pi{v), . . . ^Pk{v)) is a collection of K- dimensional approximating func- 
tions, K — > oo, K = o{n), and (3 is a K -vector of series regression estimates. Further- 
more, assume the following conditions hold. 

S.l The estimator satisfies the following linearization and strong approximation con- 
dition in £°°(V) 

^{e{v)-e{v)) _ gnivYK 



\\9niv)\\ \\gniv) 

where 



+ Rn{v), 



gn{v)' = Pn{v)'n]!\ Mn =d N{0,Ik), SUp \Rn{v)\ = 0^(1/^1^), 

where fin o^f^ positive definite matrices, and ||V„g'n(i')/||g'„(i')|||| is of polynomial 
growth in K uniformly in v eV. 
S.2 There exists an estimate s{y) of precision such that 

( ^ \\9n{v)\i. , /INN 

= ^(l + Op(l)), 



uniformly in v e V. For example, if there is an estimate ft such that — 
Q|| = then such estimate of precision is given by s{v) — \\gn{v)\\/ ^/n with 



Assumption S.l embeds a number of requirements. The first is that the series esti- 
mator admits a linear approximation in terms of a zero-mean vector M ~ (0, /), which 
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is typically a rescaled sum of vectors. The second is undersmoothing, namely that the 
approximation bias is asymptotically negligible. The third is the approximation of the 
vector A/" by a normal vector M =d N{0,I). This approximation is immediate, for 
example, in series regression with normal errors, but it also applies considerably more 
generally. Indeed, using the coupling of Yurinskii (1977), we provide sufficient primitive 
conditions for this approximation in Appendix B.l. 

Lemma 4. Assume that mes{V) > 0, where mes{V) denotes the Lebesgue measure of 
V. Condition S implies condition N.l*(b) with the penultimate process 

Z'Sv) = an{v)'Mn, Mn = A^(0, Ik), a^iv) : ^"^''^ 



9n{v)\y 



£n{V) = an[snpZ'^{v) - = b^ ^ J 2d\og{2L/V2n), 

L = sup ||VQ;„(f )|| ■ diamiy). 

When d = 1, we can also use a sharper constant an = \J 2 log ^^^^^ where i^niY) = 
jy an{v)\\dv . Furthermore, condition S implies condition N.l(b), as the sequence 
of random variables EniV) is stochastically dominated in distribution by the standard 
exponential random variable 

£n{V) <d £oo + Op{l), P[£oo > p]= exp(-p). 

Lemma H] provides a majorizing limiting variable S^o for the normalized supremum 
of the studentized empirical process Zn- It also provides a penultimate approximation 
Sn{V) for this supremum. We can use these results for construction of critical values. 
The p-th quantile of SooiV) is given by 

Coo(p) = -log(l -p). 

Therefore, we can set 

ki-a = ar^iV) + (3.2) 
a„(l/) 

Alternatively, we can base inference on quantiles of Sn{V) and estimate them numerically. 
We describe the practical details of simulation of critical values in Appendix C. It is not 
restrictive to assume that V has strictly positive measure. Even if Vq is singleton, we 
can select ^ to be a superset of Vq of positive measure, in which case our method for 
inference is valid but conservative. 
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Lemma 5. Assume that mes{V) > 0. Let an = bn and an = bn '■= an{V). Then, 
condition N.2 holds if aniVY — aniVY — >p and the following growth condition holds 

a„ ■ r„ ■ sup ||Va„(w)|| ^ 0. (3.3) 

i>GV 

The requirement that an(yY — ciniYY is a weak assumption. For example, 
consider a„ = \J 2 log in one-dimensional settings. In this case, a„(l^)^ — a„(V")^ — >-p 
is satisfied if log 0. If 1 < k„(V"), which is the case when V has non-zero 

Lebesgue measure, then | ^"[yj — 1| '^f'n- sup^gy ||VQ;„(t')|| — 0. For typical series the 
upper bound on ||Va„(f)|| is of order \/K. If also r„ = (logn)'^(_ft'/'ri)^/^'' for some c > 0, 
then the growth condition fl3.3p reduces to 

(logn)=+l/2(^/^)l/2p^l/2 ^ 

When the parameter p = 1, this amounts to a rather mild condition K'^ {\og nY /n — >■ 0, 
for some c' > 0, on growth on the number of series terms. The value p = 1 is plausible 
when the superset V is the e-argmin of the bound-generating function for some e > 0, 
as we discuss in Section 13. 6[ 



3.5. Nonparametric Estimation of 9{v) via local methods. In this section we 
provide conditions under which a kernel-type estimator of the bound-generating function 
satisfies Conditions N.l and N.2 and we also describe how to obtain critical values k. 
Kernel-type estimators include standard kernel estimators as well as local polynomial 
estimators. 

For any positive integer d and a (i-dimensional vector u = {ui, . . . ,Ud), let K(m) = 
Y[i=i K{ui), where K is a kernel function on M. We assume that a kernel- type estima- 
tor 6(v) of a bound-generating function 9(v) satisfies the following conditions. These 
conditions cover local estimation of bound-generating functions defined as conditional 
expectation functions, in which case given i.i.d. random variables {Yi, Vi, Ui) we have 
6{v) = E[Yi\Vi = v], and cr^(f) = Var(yi|l^ = v) in the expression given below. These 
conditions also cover local estimation of bound-generating functions defined as condi- 
tional quantile functions, although in this case the underlying interpretation of param- 
eters differs. 
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Condition K. 1. Assume that the estimator satisfies the following linearization and 
strong approximation condition in i°°{V): 

inhiy/'[e{v) - e{v)] w^ivyv^ 



\\'Wn{v)\\ lkn(^^)|| 

where 



Un —d Nn{0, 1) conditional on {Vi, . . . , Vn), sup — Op(a„^), 

vev 

Wn{v) is typically an n-dimensional vector of the form 

K is a kernel function that is hounded and is continuously differentiahle with a hounded 
derivative, hn is a bandwidth that satisfies /i„ — > and log n / {nh'^Y^'^ — * 0, cr'^{v) is 
uniformly continuous, bounded and also bounded below from zero, fv{v) is the probability 
density function for Vi, which is bounded away from zero and has a bounded derivative, 
and Nn{0, 1) denotes the n-dimensional multivariate normal distribution with variance 
the identity matrix, and Vi are i.i.d. 

Condition K. 2. There exists an estimate of precision such that 

.W = t^(l + o,(l)). 

uniformly in v For example, a consistent estimate of precision is given by s{v) — 

\\wn{v)\\/ -^/nh^, where 

where sup^^y \a{v) - (t{v)\ = Op(l) and sup^^y \fv{v) - fv{v)\ = Op(l). 



Conditions K.l and K.2 embed a number of requirements. As was the case for series 
estimators, a simple immediate case is nonparametric mean regression with normal errors 
that are mean independent of regressors with known conditional variance cr^(v). It is 
not difficult to extend conditions K.l and K.2 to more general cases with non- normal 
errors, an unknown conditional variance function, and additional covariates other than 
V. In Appendix B.2, we give sufficient conditions for strong approximation of kernel-type 
estimators of conditional expectation functions. 
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In order to provide an analytic approximation for the asymptotic distribution of the 
supremum of the studentized estimation process let Pd{s) = Y['j=i Pi^j)y where s = 
(si, . . . , Sd) is a d-dimensional vector and 

J K^[u)du 

for each j. 



Lemma 6. Let a„(V^) = hniV) he the largest solution to the following equation: 

mes{V)hn-''\'''^{2ii)-^''+^'^'^ai-^eM-all'^) = 1, (3-7) 

where 



A 



j K{u)K"{u)d 



u 



j K^{u)du 

Assume that K.l and K.2 hold and mesiV) > 0. Then, condition N.l(a) holds with the 
penultimate process 

\\Wn{v)\\ 

Furthermore, we have that 

ZM =d K {K^v) + K{v), sup \R'n{v) \ = Op{an{Vy^). 

where {Z"{h~^v) : v E V} is a sequence of Gaussian processes with continuous sample 
paths such that 

EKis)] = 0, 

E[Z"{si)Z"{s2)] = pd{si - S2) for s, si, 82 e Vn ■= h~^V, 
Finally, we have that 



Sn{V) := an{V) 
where Soo has the type I extreme-value distribution 



sup Z'^ {h^^v) - an{V) 



(3i 



Lemma [6] provides a majorizing limiting variable £00 for the normalized supremum 
of the studentized empirical process Z„. It also provides a penultimate approximation 
£niy) for this supremum. We can use these results for construction of critical values. 
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For example, when d = 1 and V = [a,b], 

/ \ 1/2 \ 1/2 

a^V) = l2log{h-\b - a)) + 2log^j . (3.9) 

The 1 — a quantiles of £oo is given by 

Coo(l -a) = -loglog(l - a)"^ 

Then we set 

= a„(V) + (3.10) 

(^n ) 

which consistently estimates the 1 — a quantile of Soo{V). Alternatively, we can base 
inference on quantiles oiSn{V) and estimate them numerically. We describe the practical 
details for the simulation of critical values in Appendix C. Note that it is not restrictive 
to assume that V has strictly positive measure. Even if Vq is singleton, we can select 
\^ to be a superset of Vq of positive measure, in which case our method for inference is 
valid but conservative. 

It is possible to construct an asymptotically valid, alternative critical value. Equation 
( ]A.6[) in the proof of Theorem [6] suggests that we might construct an alternative critical 
value by using the leading term in equation flA.6|) . In other words, instead of using 
quantiles of Soo{V), we can use quantiles of the following distribution- like function: 

r -id-i' 

X 



x2 \ 

exp < — exp I —X — —r 



For example, when d = 1 and V = [a,b], instead of using (13.101) . we can use the 
alternative critical value: 

fcU = MVf - 2 loglog(l - «)-i)V2. (3.11) 

In some contexts this approximation may behave better in finite samples than an ap- 
proximation using the extreme value distribution (see, e.g. Piterbarg (1996) and Lee, 
Linton, and Whang (2009)). In addition, we can consider the following form: 

= &n(l^) + '"^^^ ~ (3.12) 

an{V} 



where V = [a, b], an{V) = ^2log{h-^{b - a)), and bn{V) = a„(r)+log ^{X/27r)/an{V). 
The critical value in f l3.12p is a one-sided version of equation (29) of Hardle and Linton 
(1994), which seems to behave better in finite samples, compared to (13.101) . The critical 
values in (13.101) . (13. lip , and (I3.12p are asymptotically equivalent. 
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The following lemma provides sufficient conditions for Condition N.2. 

Lemma 7. Assume that mes^V) > 0. Let a;„(f) := Wn{v)/\\wn{v)\\. Also, let an = 
and an = bn := an{V). Then, condition N.2 holds if an{V)'^ — a„(l^)^ — >p and the 
following growth condition holds: 

an{V) ■ rn ■ sup ||V«„(t;)|| 0. (3.13) 

Furthermore, under Condition K, supj,gy ||Va„(f)|| = Op(/i^^). 

As was the case for series estimation, the requirement that a„(V)^ — a„(V^)^ — s>p is a 
weak assumption. For example, consider a„(V) in (13. 9p . In this case, a„(V)^— an(V^)^ 
is satisfied if mes(\^) > and mes(\^)/mes(y) ^p 0. In Theorem 2 below we state 
sufficient conditions for this. If r„ = {\ognY{nh'^~^/'^P ^ then the growth condition (13.131) 
holds if 

(logn)'=+^/2^n/if2p)-i/2p^0. 

When the parameter p = 1, this amounts to a rather mild condition nh'^'^ / {loguY — > 
oo, for some c' > 0, on the growth of bandwidth. 

3.6. Estimation of V . Next we consider the choice and estimation of V , which we 
choose to be the e— argmin of the function 9{v). In parametric cases, we can take e = 0, 
that is V = Vq. In nonparametric cases, it may not always be feasible to take e = and 
attain both conditions C.l and C.2. The reason is that the degree of identifiability of Vq 
is decreasing in the number of smooth derivatives that the bound-generating function 
6{v) has on the boundary of Vq, while the rate of convergence of 6{v) — 6{v) is increasing 
in this number. These two effects work to offset each other. However, we can use 
V = V^, the e— argmin, whose degree of identifiability for e > 0, under some reasonable 
conditions, does not depend on the number of smooth derivatives. 

Condition V. There are two parts: 
V.l The estimator 6 (v) satisfies 

sup \e{v) - 6{v)\/s{v) = Op{cn), where Cn > 1, 

for example, Cn = a^^ + bn under the conditions C.l and C.2. Also 

In '■= 2 A/log n ■ sup s{v) 

satisfies •jn '■= ■ Cn ^ 0. 
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V.2 The function 9{v) is separated away from 6q-\- e on the complement of the set 
Ve :— e-argmin of 9{y) = e V : 9{y) < + e} 
hy a polynomial minorant in the distance from this set , namely 

e{v)-eo-e> {cd{v,v,)y^'^ a5 

for any ^ K for some positive constant p{e), called the degree of identifiahility, 
and constant c and 6, possibly dependent on e, where 

d(v,Vf) :— inf — v'll. 

v'ev. 

We propose the following estimator of Ve'. 

Ve^{veV: e{v) < inf e{v) + + e}. (3.14) 

Theorem 2. Suppose that conditions V.l and V.2 hold. Then with probability converging 
to one, the set is a subset of the estimator V^. Moreover, the Hausdorff distance 
between these two sets approaches zero at the following rate: 

Moreover, the Lebesgue measure of the difference between the two sets approaches zero 
at the following rate: 

where d is the dimension of the Euclidean space containing V. 

Thus the rate of convergence depends on the uniform rate of convergence 7„ of v i— > 
9{v) io V ^ 9{v) and on the degree of identifiability p(e) of the e-argmin set V^. 

The following lemma presents a case where condition V.2 holds under reasonable 
conditions and the degree of identifiability p(e) is one. 

Lemma 8. Let e > be fixed, and suppose that V is a convex body in R*^ and is in 
the interior ofV. Suppose that there exists a function r){-) such that 

9{v) = m£ix{r]{v),9o), 

where rj : V ^ M. is continuously dijferentiable on V with ||V7;(t')|| bounded away from 
zero on 

dV, ■.= {veV: 9{v) -9o= 7]{v) -9q = e). 
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Then condition V.2 holds with 

p(e) = l,c= inf ||Vr;(v)||/2 > 0, and 6= inf {ri{v) - Oq - e) > 

vedVe d{v,Ve)>do 

for some do > 0. 

3.7. Inference on the identified set 0/ and on the true parameter value 6*. 

We can use our one-sided confidence bands for lower and upper bounds to construct 
confidence intervals for the identified set, as well as for the true parameter 6*. As in the 
introduction, we suppose that the identified set is of the form G/ = [O^, 9q~\ , where 9q = 
sup^gyf O^v) is the maximum of a collection of lower bounds, and = sup^^yu 9'^{v) the 
minimum of a collection of upper bounds on parameter 6* . So far, we have described how 
to consistently estimate such bounds, as well as how to construct one-sided confidence 
bands. We now describe how these one-sided confidence bands can be used to construct 
two-sided bands for either ©/ or 9*. 

We can construct two-sided bands for the identified set ©/ as follows. Let ^ and 9^ 
denote the end-points of one-sided bands so that 

P (^0 <0i)>P + o(l) and P (9l >9l^>p + o(l). 

Then, by Bonferroni's inequality, the region [^, 9^] withp = l — a/2 is an asymptotically 
valid 1 — a confidence interval for ©/, 

P {K ^oI ^ iK^ %]) >1-P{9',<9'^-P (^o" >9;)>l-a + 0(1). (3.15) 

We can construct two-sided bands for the true parameter value 9* as follows: Let 
A+ = A„1[A„ > 0], where A„ = 9"^/^ - 9^^/^, and p„ = 1 - $(r„A+)a, where $(■) 
is the standard normal CDF, Tj, is a sequence of constants satisfying r„ — > oo and 
r„|A+ - A„| 0, where A„ = 6*^ - 9^^. Notice that since 1/2 < $ (c) < 1 for c > 0, we 
have that p„ e [1 — a, 1 — a/2]. Then under conditions similar to stated below we have: 

inf P (r e [9^,91]) >l-a + o(l). (3.16) 

We note that the confidence intervals are valid uniformly with respect to the location 
of the true parameter value 9* within the bounds. Moreover, this statement allows the 
model and thus also the width of the identification regions A„ to change with the sample 
size. Thus these confidence intervals are also valid uniformly with respect to A„. 

Before stating the formal result, some further notation is required. In what follows we 
shall use the additional superscripts j = u (for upper bound) oi j = I (for lower bounds) 
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relative to the main text. Thus, all statistics, estimators, and sets receive such indices; 
moreover, we define the studentized empirical processes as follows 

where the second expression has the sign reversed. 

The following theorem provides a formal statement of the validity of our proposed 
confidence intervals for 6*. 



Theorem 3. Consider a sequence of models indexed by n such that the following condi- 
tions hold. Assume C*.l(h) holds for each j G {u,l}, so that ai^- {sup^^yj Zl{v)—bD <d 
Sl{V^) + Op(l), where each Sl{V^) = Op{l) has a known distribution and satisfies the 
stated anti- concentration property. Assume that C.2 holds so that for each j G {u,l}, 
(sup^gyj Zl{v) — hi) — ■ (sup^gyj Zl{v) — 6{) 0. Further suppose that c?{p) 



a: 



■3 



is a consistent upper hound on c{^{p) := the p — th quantile of Si{V^), where n = oo 
under C.l(h), namely for each j , (?{p) > c{^{p) + Op(l). Let Tn be a sequence of positive 
constants such that ( A.'7\ ) holds. Then if An > 0, (13.161) holds. 



Regarding the choice of Tn, we note that since | A+— A„| -^p typically at a polynomial 
rate in n, there are many admissible choices of r„, for example r„ = logn. In practice 
it may be desirable to use a different choice, for example, r„ = a^^/logn, where is 
a standardizing sequence for A„ — A,„ in the sense that cr^^(A„ — A^) = Op{l). More 
specifically, cr„ could be the standard deviation of A„ — A„. Another choice, which is 
more readily available in our context is cr„ = max [6*3^^ — 9^^^, 9^^^ — 9[^^]. 

The construction above employs reasoning analogous to that of Imbens and Manski 
(2004) and Stoye (2009), though the specifics differ since the former approaches do not 
apply here. The reasoning behind our construction is as follows. If the width A„ of 
the identification region is bounded away from zero, then 9* can be close to either the 
lower bound 9q or the upper bound 9q but not both, so in this case the end-points 
and 9i_^ from one-sided intervals suffice for a two-sided interval. If A„ is zero or 
approaches zero, then 9* can be close to both the lower bound 9q and the upper bound 
9q simultaneously, so in this case the more conservative end-points 9^_^^2 and 9[_^^2 are 
needed for a valid two-sided confidence interval. To smoothly and robustly interpolate 
between the two situations, we use the end-points 9^ and 9-, from one-sided intervals 
with the level p„ G [1 — a, 1 — a;/2] varying smoothly as a function of A„. 
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4. Monte Carlo Experiments 

In this section we present the results of some Monte Carlo experiments that illustrate 
the finite-sample performance of our method. We consider a Monte Carlo design that is 
similar to that of Manski and Pepper (2009). In particular, we consider the lower bound 
on 9* = E[Yi(t)\Vi = v] under the monotone instrumental variable (MIV) assumption, 
where t is a treatment, Yi{t) is the corresponding potential outcome, and Vi is a monotone 
instrumental variable. The lower bound on i?[yj(t)|Vi = v] can be written as 

maxE[Yi ■ l{Zi ^t} + yo- l{Zi ^ t}\Vi ^ u] , (4.1) 

u<v 

where Yi is the observed outcome, is a realized treatment, and yo is the left end- 
point of the support of Yi, see Manski and Pepper (2009). Throughout the Monte Carlo 
experiments, the parameter of interest is 6* — E[Yi{l)\Vi — 1.5]. 

4.1. Data-Generating Processes. We consider two cases of data-generating processes 
(DGPs). In the first case, which we call DGPl, Vq — V and the MIV assumption has 
no identifying power. In other words, the boundary-generating function is fiat on V, 
in which case the bias of the analog estimator is most acute, see Manski and Pepper 
(2009). In the second case, which we call DGP2, the MIV assumption has identifying 
power, and Vq is a strict subset of V. 

Specifically, for both DGPs we generated 1000 independent samples from the following 
model: 

Vi ~ Unif[-2, 2], = HMVi) +ei> 0}, and = /io(K.) + cto(1^,)[/,, 

where £j ~ A^(0, 1), rji ~ A^(0, 1), Ui = min{max{ — 1.96, 1.96}, and (l^,?7j,£j) are 
statistically independent, where i = l,...,n. For DGPl, (po{v) = 0, Ho{v) = 0, and 
cro(t') = l^]. In this case, the bound-generating function 

ei{v) E [Yi ■ l{Zi = 1} + yo • l{Zi ^ l}\Vi = v] 

is completely fiat {9i{v) = -0.98 for each v e V = [-2, 1.5]). For DGP2, an alternative 
specification is considered: 

ipQ^v) =vl{v <1) + l{v > 1), ^io{v) = 2[vl{v <l) + l{v > 1)], and ao{v) = \v\. 

In this case, 9i{v) = fioiv)^[(po{v)] — 1.96^[—(po{v)], where $(■) is the standard normal 
cumulative distribution function. Thus, v 9i{v) is strictly increasing on [—2, 1] and is 
fiat on [1, 2], and Vq = [1, 1.5] is a strict subset of V = [—2, 1.5]. 
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We considered sample sizes n = 500 and n = 1000, and we implemented both series 
and kernel-type estimators to estimate the bound-generating function 6i{v) in (14. ip . For 
both estimators, we computed critical values via simulation as described in Appendix 
IC.21 and we implemented our method both with and without estimating V^. For the 
latter, the precision-corrected curve is maximized on the interval between the 5th per- 
centile of Vi and the point 1.5. We do this in order to avoid undue influence of outliers 
at the boundary of the support of Vi. For the former, is estimated by in (13.141) 
with e = 10~®, c„ = i/log n, and = 2y^\ogn ■ sup^gy s{v). 

4.2. Series Estimation. For basis functions we used cubic B-splines with knots equally 
spaced over the sample quantiles of V^. The number K of approximating functions was 
obtained by the following simple rule-of-thumb: 

K = K, K:= K^^ X n"^/^ x n^/\ (4.2) 

where a is defined as the largest integer that is smaller than or equal to a, and Kcv 
is the minimizer of the leave-one-out least squares cross validation score from the set 
{5,6,7,8,9}. If Oiiv) is twice continuously differentiable, then a cross-validated K has 
the form K oc n^^^ asymptotically. Hence, the multiplicative factor n~^/^ x n^^^ in (14.21) 
ensures that the bias is asymptotically negligible from under-smoothing]^ 

We obtained the precision-corrected curve for the lower bound by subtracting the 
product of a critical value and an asymptotic pointwise standard error from the estimated 
function. At each data point of V^, we computed the pointwise standard error of our 
estimate using an asymptotic heteroscedasticity-robust formula. 

4.3. Kernel- Type Estimation. We used local linear smoothing since it is known to 
behave better at the boundaries of the support than the standard kernel method. We 
used the kernel function K{s) = ^{1 — s^)^l(|s| < 1) and the following rule of thumb 
bandwidth: 

h = hROT xsvx v^l^ X n"^/^, (4.3) 



To check the sensitivity of simulation resuhs, we considered alternative bandwidths such as X ± 1 or 
± 2 and found that the simulation results were not very sensitive within the local range around our 
rule-of-thumb choice. 
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where h^oT is the rule-of-the-thumb bandwidth for estimation of 9i{v) with studentized 
V ^ as prescribed in Section 4.2 of Fan and Gijbels (1996). The exact form of Hrot is 

/ WQ{y)dv 



Hrot = 2.036 



where V^'s are studentized l^'s, 9\ (■) is the second-order derivative of the global quartic 
parametric fit of 6i{v) with studentized V^, o"^ is the simple average of squared residuals 
from the parametric fit, Wq{-) is a uniform weight function that has value 1 for any Vi 
that is between the 10th and 90th sample quantiles of Vi. Again, the factor n^/^ x n"^/^ 
is multiplied in (14.31) to ensure that the bias is asymptotically negligible due to under- 
smoothingQ 

At each data point of V^, we computed an estimate of the pointwise standard error 
with the asymptotic standard error formula [nhfviy)]'^ f K'^{u)du o"^(f ), where fv is the 
density of V and cr^(w) is the conditional variance function. We estimated fv and cr'^iy) 
using the standard kernel density and regression estimators with the same bandwidth h. 

4.4. Simulation Results. Table [H summarizes the results of Monte Carlo experiments. 
To evaluate the relative performance of our new estimator, we also consider a simple 
analog estimator of the left-hand side of (14. ip . 

First, we consider Monte Carlo results for the series estimator for DGPl with n = 500. 
In this case, not surprisingly, the simple analog estimator suffers from substantial biases 
since the true bound-generating function is flat on V. However, our new estimator, 
which is asymptotically median unbiased, has negligible mean bias and even smaller 
median bias. One potential concern with the new estimator is that it may have a larger 
variance due to the fact that we need to estimate the pointwise standard error for each 
point. However, it turns out that with DGPl, the new estimator has smaller standard 
deviation (SD) and also smaller mean absolute deviation. As a result, the new estimator 
enjoys substantial gains relative to the analog estimator in terms of the root mean square 
error (RMSE). It is interesting to comment on estimation of in this case. Since the 
true argmax set Vq is equal to V, an estimated should be the entire set V. Note 
that the simulation results are similar since for many simulation draws, K = V. Similar 
conclusions hold for the sample size n = 1000. Note that the biases of the sample analog 



''' As in series estimation, we considered alternative bandwidths such as 0.8h or 1.2h and found that the 
quahtative findings of Monte Carlo experiments were the same. 
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estimator are still quite large, even though it is a consistent estimator. The discrepancies 
between nominal and actual coverage probabilities are not large. 

We now move to DGP2. In this case, the true argmax set Vq is [1, 1.5]. In this case, 
our estimator is upward median unbiased and the coverage probability is conservative. 
The Monte Carlo results are consistent with asymptotic theory. As in DGPl, the sample 
analog estimator suffers from upward biases. However, unlike in DGPl, our new pro- 
posed estimator has a slightly larger RMSE than the analog estimator with n — 500. In 
DGP2, the true argmax set is a strict subset of V. Hence, we expect that it is impor- 
tant to estimate V^. On average, the estimated sets were [—0.847, 1.5] when n = 500 and 
[—0.147,1.5] when n = 1,000. As can be seen from the tabic, our method performed 
better when is estimated in terms of making the bound estimates and confidence 
intervals less conservative. However, there was no gain for the sample analog method 
even with the estimated V^. When n = 1, 000 and is estimated, the RMSE of the new 
proposed estimator is more than 10% lower than that of the sample analog estimator. 

We now comment on local linear estimation. Overall, simulation results are quite 
similar for both the scries estimator and the local linear estimator. With DGPl, the 
differences between the two estimators arc negligible, but with DGP2, it seems that the 
series estimator performs slightly better than the local linear estimator. We conclude 
from the Monte Carlo experiments that our inference method performs well in coverage 
probabilities and that our proposed estimator outperforms the sample analog estimator, 
especially when the MIV assumption has no identifying power. 



5. An Empirical Application 

In this section, we illustrate our inference procedure by applying it to a MIV-MTR 
(monotone instrument variable - monotone treatment response) bound of Manski and 
Pepper (2000, Proposition 2). The parameter of interest is £^[yj(i)|Vi = v], where t is a 
treatment, Yi{t) is a potential outcome variable corresponding to a treatment t, and Vi 
is a scalar explanatory variable. Let Zi denote the realized treatment that is possibly 
self-selected by individuals. The source of the identification problem here is that for 
each individual i, we only observe Yi = Yi{Zi) along with {Zi,Vi), but not Yi{t) with 
ty^Zi. The MIV-MTR bounds take the form 

supE[Yl\Vi^u\ <E[Yi{t)\Vi^v] < ini E [Y;^\Vi ^ u] , 
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where = Yi-l{t> Zi] + yQ-l{t < Zi}, r> = Yi-l{t< Z,}+yi-l{t > Zi}, and [yo,yi] 
is the support of Fj. Thus the bound-generating functions are O^v) = E \Yl\Vi = v~\ and 
6^{v) = E[Yj^\Vi = v] with intersection sets = (—00, t>] for the lower bound and 
V" = [v, 00) for the upper bound. Note that the MIV-MTR bounds are uninformative 
if the support of Y is unbounded. In the empirical illustration below, we use the sample 
minimum and maximum as the boundary points of the support. 

We use data from the National Longitudinal Survey of Youth of 1979 (NLSY79); in 
particular, we use the same data extract as Carneiro and Lee (2009) giving us n = 2044 
observations. The outcome variable Yi is the logarithm of hourly wages in 1994. In 
order to alleviate problems induced by possible measurement error and the occurrence 
of missing wages, Yi is constructed as a 5 year average of all non-missing wages reported 
in the five year interval centered in the year 1994. The treatment variable t is years 
of schooling. The monotone instrumental variable Vi is the Armed Forces Qualifying 
Test score (AFQT, a measure of cognitive ability), normalized to have mean zero in the 
NLSY population. The MIV assumption here stipulates that the conditional expectation 
of potential log wages at any level of schooling is nondecreasing in AFQT score. The use 
of AFQT as a MIV can tighten the bound, but its empirical implementation carries some 
challenges since the bounds are the suprema and infima of nonparametric estimates^ 
Table [2] presents descriptive statistics for our sample. 

Our targets are the MIV-MTR bounds for i^^[yj(t)|f] at f = (the mean value of 
AFQT) for high school graduates (t = 12) and college graduates (t = 16). We estimate 
the bound-generating functions E [Y^^\Vi = u\ and E [Yj^\Vi = u] by local linear smooth- 
ing. For each nonparametric function, we use the same kernel function and rule-of-thumb 
as in Section 14.31 In addition, we used the critical value in (13. lip and estimated K as 
in Section 14^31 

Table [3] summarizes our empirical results. The first row shows naive sample analog es- 
timates, which are based on the maxima and minima of the bound-generating functions. 
The second row presents our median downward-unbiased (upward-unbiased) estimates 
for the upper (lower) bounds. We see that our estimate and the analog estimate for the 
upper bound of average log wages for college graduates differ quite substantially. The 
economic implication of this difference is large: the upper bound for the return to college 
(defined as ^[^^(16)1^^ = 0] - ^[^^(12)11^^ = 0]) is 2.87-2.12 = 0.75 based on the naive 

^The NBER working paper version of Manski and Pepper (1998) also considered AFQT as a MIV. 
See the comments in the NBER working paper version of Manski and Pepper (1998, Section 6.2) for 
discussion of the difficuhy of carrying out inference. 
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sample analog estimates, whereas it is 3.18 — 2.03 = 1.15 based on our proposed new 
estimates. The resulting difference between the two estimates of the upper bound for 
the return to college is 40%, a 10% difference in terms of one year of college education. 

We now consider 95% one-sided confidence intervals, which are given in the third row 
of the table. If we combine upper and lower bounds together, then we obtain a 90 % 
confidence interval for average log wages of each education group. Note that the 90% 
confidence interval for the potential average college log wages is wider that the 90% 
confidence interval of the high school wagesjj This is because the estimate of the upper 
bound-generating function for college wages is rather imprecise. 

In order to illustrate the sources of the difference between naive sample analog esti- 
mates and our estimates in Figures H] and [5], we plot estimated bound-generating func- 
tions V ^ E \Y!'\Vi = f] ,j = /,« as well as precision-corrected bound-generating func- 
tions for college graduates. In Figure H] we see that the sudden drop of the estimated 
bound-generating function in the right tail for college graduates tightens the empirical 
MIV-MTR bound, but the tightness of this bound could be due to reduced precision 
of the local linear estimator at the boundary. On the other hand, our new method 
automatically corrects for varying degree of precision. 

6. Conclusion 

In this paper we provided a novel method for inference on intersection bounds. Bounds 
of this form are common in the recent literature, but two issues have posed difficulties 
for valid asymptotic inference and bias-corrected estimation. First, the application of 
the supremum and infimum operators to boundary estimates results in finite-sample 
bias. Second, unequal sampling error of estimated boundary functions complicates infer- 
ence. We overcame these difficulties by applying a precision-correction to the estimated 
boundary functions before taking their intersection. We employed strong approximation 
to justify the magnitude of the correction in order to achieve the correct asymptotic 
size. As a by-product, we proposed a bias-corrected estimator for intersection bounds 
based on an asymptotic median adjustment. We provided formal conditions that justi- 
fied our approach in both parametric and nonparametric settings, the latter using either 
kernel or series estimators. As such, our method is the first to provide valid inference 
for nonparametric specifications of a continuum of conditional moment inequalities. 

^ To check sensitivity to the choice of critical values, we obtained the corresponding confidence intervals 
using critical values in (|3.I2p . It turns out that the resulting 90% confidence intervals are almost 
identical: [1.96,2.88] for high school wages and [2.30, 3.46] for college wages, respectively. 
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At least two of our results may be of independent interest beyond the scope of in- 
ference on intersection bounds. First, our result on the strong approximation of series 
estimator is new. This essentially provides a functional central limit theorem for any 
series estimator that admits a linear asymptotic expansion, and is applicable quite gen- 
erally. Second, our method for inference applies to any value that can be defined as 
a linear programming problem with either finite or infinite dimensional constraint set. 
Estimators of this form can arise in a variety of contexts, including, but not limited 
to intersection bounds. We therefore anticipate that although our motivation lay in 
inference on intersection bounds, our results may have further application. 

Appendix A. Proofs 

Proof of Theorem [H We prove the results assuming condition C*.l only, since 
condition C.l is a special case with SniV) = SooiV). 

Part 1. Observe that 

P[0o < K] = P[irif[e{v) -00 + \bn + c{p)/an]s{v)] > 0] 

> P[mi[e{v) - e{v) + [6„ + cip)/an]siv)] > 0] 

= P[ar,[e{v) - e{v)]/s{v) + aX > ~c{p),\/v G V] 
= P[an[Zn{v) -K] < cip),\/v G V] 

= P[a„[sup Zn{v) -bn] < c{p)] 

vev 

= P[a„[sup Zniv) - bn] < c{p) + Op(l)], 

where we used that 6{v) > 6q for all f G V as well as condition C.2. Then we observe 
that using condition C.l*(b) and the anti-concentration property 

P[a„[sup Zniv) - bn] < C{p) + Op(l)] = P[£n{V) < c{p) + Op(l)] 
v&V 

> P[£niV) < Cn{p) + Op(l)] 

> P[Sn{V) < Cn{p)] - P[Sn{V) G [c„(p) ± 0^(1)]] 

>p-o(l), 

which proves part 1. 
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Part 2. Using Condition C*.l(a) and C.2, we obtain 

P[eo < Oj,] = P[mi[e{v) -00+ \bn + c{p)/an]s{v)] > 0] 

= P[a„ [sup ^° -bn] < c{p)] 

= P[a„[sup - bn] < c{p) + Op(l)] 

= P[a„[sup Zn{v) - bn] < c{p) + Op(l)] 

= P[£n{Vo) < Cn{p) + Op{l)] 

= P[£n{Vo) < Cn{p)] + Wn wheie \Wn\ < P[£^„(Vo) G [Cn{p) ± Op(l)]] 

= p + o(l), 

where we also used that {v) = 9q for all G Vq, and the continuity of the limit 
distribution of EooiYo). □. 

Proof of Lemma [1]. From the Donsker condition and by the Continuous Mapping 
Theorem, we have that 

sup Zn{v) =d £oo{Vo) + Op(l) = sup Z^{v) + Op(l). 

Moreover, the distribution of the limit variable is continuous by the non- degeneracy of 
the covariance kernel. This verifies condition C.l(a). 

By the stochastic equicontinuity, we have that 

I sup Zn{v) - sup Zn{v)\ < SUp \Zn{v) - Zn{v')\ = Op(l), 

vGV \v-v'\<dH{V,V) 

for any sequence of sets V such that dH{V, V) = Op{rn) = Op(l). This implies condition 
C.2. □. 

Proof of Lemma [2], This is immediate from the statement of the conditions. □. 
Proof of Lemma [3l We have by Taylor expansion that 

»("--'° + °^(l» '(ii./.Af + o,(l)) 

0*7 

=d g{vyAf + Op{l). 
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Then in i°°{V), 



ZJv) 



{eiv)-eiv)) givYAf + Opil) 



s{v) ||^(t;)||+o,(l) 



□. 



Proof of Lemma |4] By assumption we have that a„ ■ sup^gy \Zn{v) — Z'^{v) \ = Op(l) 
for any a^, including stated in the Lemma. Now we set SniV) = a„[sup^£y Z'^{v)—bn]- 
This random variable need not have a limit distribution, but its exact distribution can 
be obtained by simulation. Thus, in our words, its distribution is known. 

Case 1 (Dimension of regressor v is one). In this case, we can also use 
Hotelling's tubing method to conservatively estimate the quantiles of SniV). Expressions 
for dimensions greater than one are less tractable, but they can also be stated at the 
cost of complicated notation. 

Indeed, from the Hotelling-Naik tubing method we obtain that 
P[snpZ'^{v) >k]<{l- + lll^e-''"/' 

where UniV) = fy || V«„(f ) ||c?t>. As A; — oo, we have 

P[supZ»>fc]<^e-'=^/^[l + o(l)]. 

veV ^TT 

For any p, we choose k = kn{p) = Qn + p/ dn- Note that 

a„ = v/21og(K„(y)/27r) ^ ^^^e'^'^ = I. 



Then 



equivalently 



P[sup Z'^{v) > A;„(p)] < exp ( -p - 1^ ) [1 + o(l)]. 



2di 



P[a4sup Z'^{v) - a„] > p] < exp i-p - 1^") [1 + o(l)]. (A.l) 

Using the above relations, we conclude that the quantiles of SniV) can be estimated 
conservatively by the quantiles of an exponential distribution or by the quantiles of 
an exponential-distribution-like function Fn{p) ■= 1 — exp ^— p — . Thus, we have 
established N.l(b) when the dimension of the regressors equals one. 
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Case 2. (Dimension of regressor v is any). With regressors of higher dimen- 
sion, we can also use the following argument. Since the metric entropy of {Z^ivY, v e V} 
under the L2 pseudometric p satisfies 

N{e, V,p)< I ] , L — sup ||Va„(v)|| • diam(\/) < for some constant p < 00, 

we have by Samorodnitsky-Talagrand's inequality (van der Vaart and Wellner (1996), 
p. 442) that for i large enough and some constant C 

P{supZM>£)<2{C-L)'{l-m)- 
vev 

Since 

7;- Vlog2(C.L)^< v/b^, 
the bound implies by Feller's inequality 

P(supZ»>£)<i^e-^^/^+^"/^ r„<Vb^ 

We thus conclude that 



sup |Z;(t;)| = 0,(0^). 



vev 



For any p, we choose k = kn{p) = an + p/ani where a„ is the largest solution to 

^ (C-L)''a;V"/2 = l 



^27r 

which implies that as n — > 00, using the assumption that sup„ ||V^g'n('y)/||5'n('y 
an ~ ^J2dlog{2LC/V2^) ~ ^ 2d\og{2L / ^2^) < ^j2d\ogK. 

Then 

P[sup Z'Sv) > kn{p)] < ^ cxp (^-p - ^) [1 + o{l)] < exp (^~p - ^) [1 + 0(1)], 
equivalently 

PK[sup Z'^{v) - an] >p]< exp (^-p - [1 + o(l)]. (A.2) 

Thus, we have established N.l(b). □. 
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Proof of Lemma [5l In order to establish N.2, we can use a crude approach based 
on Cauchy-Schwarz inequahties 

supZ'^{v) - sup Z'^{v)\ < a„ sup - a„(f '))'A/'„| 

v£V ""S^ \v-v'\<rn 

< a„ sup ||Va„(t;)||||f - f' 

\v—v'\<rn,\v—v'\<rn 



< a„sup \\Vaniv)\\rnOp{VK). 

Provided a„sup^gy || Va„(w)||r„V^ 0, we have the result. However, a substantially 
better condition follows from a careful use of Samorodnitsky-Talagrand's inequalities for 
Gaussian processes as shown below. The strategy shown below has been used by Belloni 
and Chernozhukov (2007) to bound oscillations of Gaussian processes of increasing di- 
mension over vanishing neigborhoods. Here, we adopt their strategy to our case, which 
is quite a bit different due to particular structure of the function 

We will use the following Samorodnitsky-Talagrand maximal inequality for Gaussian 
processes (Proposition A. 2. 7 in Van der Vaart and Wellner (1998)). Let X be a separable 
zero-mean Gaussian process indexed by a set T. Suppose that for some k > cr(X) = 
supjgj^ cr(Xt), < eo < o-{X), we have 



N{e,T,p)< (^)\ forO 



<e <eo, 



where N{e,T,p) is the covering number of T by e-balls w.r.t. the standard deviation 
metric p{t,t') = o"(X( — Xf/). Then there exist an universal constant D such that for 
every A > cr^(X)(l + y/v)/eQ we have 

We apply this result to the zero-mean Gaussian process X„ : V x V ^ M defined as 

Xn,t = ian{v) - aniv')yj\fn, t = (v, v') : \v - v'\ < r„. 

It follows that supjg2.X„^( = sup|^_„/|<^^ («„(!>) — an{v')yAfn- For the process X„ we 
have: 

(t(X„) < sup \\an{v) - an{v')\\ < sup ||Va„(f)||r„. 
Furthermore we have that 



X(e,T,p) < I , L := sup-||Va„(t;)|| ■ r„ ■ diam(V^) < sup ||Va„( 



V]\\ -Vr, 
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SO that the bound on covering numbers holds with k < L, and v = d. Applying the 
Samorodnitsky-Talagrand inequality we conclude that for every £ — > oo, cq = cr(X„), 

Pr{supX„,i > £a(X„)} < (1 - $(£)) ^ 0. 
teT 

Therefore, we conclude that sup^g^Xj = Op((T(X„)). Thus, 

sup Z'^{v) - sup Z'^{v) < an sup \{an{v) - an{v'))'N'n\ 

v£V ^'^'^ \v-v'\<rn 

= Op(a„ sup ||Va„(t;)||r„), 

which is Op{l) by our assumption. Hence, we have shown that 

ttn ■ sup Z'^{v) — a„ ■ sup Z'^{v) = Op(l). 

Since a„ = &n, it only remains to show that (a„ — a„) sup^^y ) — >p 0. Note that 
sup^gy Z'^{v) = Op(a„) and (a„ - a„) = (a^ - a^)/ (a„ + a„). Therefore, 

(an - an) sup Z'^{v) = Op(al - al), 

which is Op(l) by assumption. □. 

Proof of Lemma [6l Arguments similar to those used in the proof of Lemma 3.4 of 
Ghosal, Sen, and van der Vaart (2000) yield 

where 

sup Kiv)\ = Op (hn^/\og h-A . 
Now note that Conditions K.l and K.2, along with ( ]A.3I) . imply that 

sup \Zn{v) - Z:{h-^v)\ = OpianiV)-^). 

Since the distribution of Z'^(s) does not depend on n, for the purpose of statistical 
inference, it suffices to consider the asymptotic behavior of a Gaussian process, say Z'{s), 
that has the same covariance function as Z'^{s). We first derive the asymptotic behavior 
of the tail probability of the maximum of Z'{s) over s on a set S with a fixed measure, 
mes(5). Define 

OO / -| 

2 



\E'(a) = —== / exp ( — X dx. 

V2^Ja ^ * ^ ' 
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Recall that 



We can prove that 



A 



- J K{u)K"{u)du 



u 



Pr fmaxZ'(s) >a \ = mes(5) f — ^ a'^^(a)[l + 



0(1)] 



(A.4) 



as a — > oo. To show this, we use the double sum method developed in Piterbarg (1996), 
applying in particular Piterbarg's Lemma 7.1. Note that for each j, 

A 



and that p(sj 



for Sj > 2 {K has support in [—1, 1]). Hence, 



Pd 



as s ^ 0. Thus, the Gaussian process Z'{s) has a stationary structure (^E^'^\ a^'^^) with 
C = diag(v^, . . . , VV^), E^"^^ = (1, . . . , 1) and a^^) = (2, . . . , 2) (using the notation 
in Piterbarg (1996)). Then an application of Corollary 7.1 of Piterbarg (1996) gives 



Pr ( m^ Z'(s) > a] = H 



mes(5)a'^^(a)(l + o(l)) 



(A.5) 



as a — >• cxo, where H^d)^^^ is the Pickands' constant (see Section 4 of Piterbarg (1996) 
for its definition). In our case, H^(d) „(d) = (vr)"*^/^ by (F.4) and Lemma 6.4 of Piterbarg 
(1996). Then, (lA.4p follows immediately from (lA.Sp . 

Then arguments almost identical to those used in the proof of Theorem A. 3 of Lee, 
Linton, and Whang (2009), which is based on the proof of Theorem G.l of Piterbarg 
(1996), yield the following: for any x, 



Pr a„iV) 



supZ" {h^^v) - an{V) 



< X 



exp < — exp 



X 



-X 



1 + 



X 



anivy 



(A.6) 



where a„(y) is defined in (13. 7p . Since a„(V^) oo, (13.80 is proved. □. 

Proof of Lemma [T]. This lemma can be proved using arguments almost identical 
to those used to prove Lemma O In particular, as in the proof of Lemma 0, we ap- 
ply Samorodnitsky-Talagrand's maximal inequality to the following zero-mean Gaussian 
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process X„ : V x V ^ M, which is defined as 

Xn,t = {an{v) - an{v')yiJn, t={v,v'): \v-v'\<rn 



with an(v) 



\\Wn{v)\\ 



Then we have that 



sup Z'j^lv) - sup Z'^{v) < a„ sup - a„(i;'))'Un| 

v£V \v-v'\<r„ 

= Op{an sup ||Va„(t;)||r„ 



This proves the first conclusion of the lemma. To show the second conclusion of the 
lemma, note that 

|Vm„M|| , ||V||u;„(t;)|||| 



Wr,\V 



w„\v 



Furthermore, since 



we have that 

V\\Wn{v)\ 
lIV^nMll 



\Wn(V 



v-Vi 

hr. 



1/2 



n d 



VK 

v-Vi 



K 



-| 2>, 1/2 



where VjK and Vjfv are the j-th elements of VK and V/y. Then under Condition K, 
llVctnl"?^)!! is at most Op{h~^) uniformly over v. Therefore, we have proved the second 
conclusion of the lemma. □. 

Proof of Theorem [21 Let 



Cn = C„SUps(w), 7„^.n 

vev 



Cn, 6^0 = min6'(f) + 4c„ 



Note that wp 1, sup^gyJ6'(f )] < 6*0 + e. This follows from two observations. First, by 
construction ^„ = Op(Cc„), so wp 1 

sup[^(t;)] < sup[^(t;) + Op(C„)] < sup[^(t;) + (C/2)cJ < + e + (C/2)c,. 
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Second, wp 



Hence wp — > 1 
Next, 



^o + e > mi{e{v)-OpiCn)+Lcn} + e, 

> inf {e{v) - {£n/2)Cn + inCn} + 6, 

> ■mi{e{v) + iL/2)cn} + e, 

> eo + {ij2)cn + e. 



K C K and sup d{v, K) = 0. 



sup d{v, K) = sup{d{v, K) : ^'(f ) < Oq + e} 

< sup{div, K) : eiv) -00- e< Op(Cn) + In} 

< snp{d{v, K) : 9{v) - 9o - ^ < In ■ (l + Op(l))} 

< sup{t/(t;, K) : (ct/(t;, K))^^^^ A 6 < 7n(l + 0^(1))} 

< sup{a; : {cxY^'^ A5 < 7„(1 + Op(l))} 

^ [(7n + 0p(l))]^/^(-) ^ ^_ 

c 

The first claim of tlie tlieorem follows. The second claim follows from the inclusion 
K ^ K, so that 

mes(V; \ K) < [sup d{v, K)]' <p hn)'^'^'^. □ 

Proof of Lemma [S]. Take any v ^ V^. A projection of v on the set is defined as 

f e G arg min lit; — 

v'eV:9{v')-eo<e 

The Lagrangian characterization of the solution to this problem is of the form: 

V — Ve = Wri{Ve) 

for some scalar A > 0. This is true because the solution is necessarily an interior one by 
belonging to the interior of V and the latter being a convex body in M'^. Hence 

V — Ve = \\V — fell — = d{V, Ve] 



\\Vviv.)\\ ' ' ''||Vr^(f.)|| 
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By Taylor expansion we have that for some v* on the hne joining v and 

e{v) -e,~e = v{v) -e,-e = Vv{v:nv - V,) = vv{v:y^^^^d{v, K). 

\\V7]{Ve)\\ 

If d{v, K) > is small enough, say d{v, K) < do, then by continuity of V?7(f ) we have 
that 

\\Vv{vM/^>c, 



where c = inf^ggy^ ||Vr7(f)||/2 . Thus, for 6 = mfd{v,v,)>do{^iv)-9o-e) = mfd(vy,)>do{v{v)- 
0o-e)> 0, 

e{v) -eo-e> cd{v, V,)l{d{v, K) < do} + Sl{d{v, K) > do} > {cd{v, K)) A S. 

Finally, note that 5 > by continuity of 9{v), by the definition of as e-argmin of 9{v), 
and by do > 0. □ 

Proof of Theorem [31 Recall that we construct the two-sided bands for the true 
parameter value 6* as follows: Let 

A+ = A„1[A„ > 0], where A„ = O'^/^ - ^1/2, and Pn = I - $(r„A+)a, 

where $ (■) is the standard normal CDF and r„ — 00 is a sequence of constants satisfying 

r^{{al,)-' + K}s' ^0. (A.7) 

This condition implies that t„ |A+ — A„ 
P = sup^gv, (v), j e {u,l}. 

Step 1. We use the notation 



0, where A„ 



9q — 9q. We also define 



Pn 



i-<i>(r„A„)«, ^i = ei-e* 



In what follows we allow 9* to be an arbitrary sequence of constants within the identified 
set, so that its value can change depending on n; hkewise, we allow A„ > to change 
with n. 

The probability that 9* lies outside the confidence interval is 



nl nu 

Pn ' Pn 



}<P{()*<^k}+P{(^*>^l}- 



P^9* i 

Focusing on the second term, we have 

P W > 91 \=p\9-^>9-^-9*+ inf r {v) +{bl + c^ ) {v) 
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from the definition of 9^ . We can show that for some e > and some \ 
P{0*>ei} < pS^S:{V^)>c:{pr.-e)+a:^ + eJi + o{l) 



< p \ £:iv-) > ci{p„ 



A" 



e)+al-^}+oil). 



A 



The first inequality follows similarly to the proof of Theorem 1, also using that s" < s", 
that Pn = Pn + Op{l) so that for any £ > 0, p„ > p„ — e with probability approaching one, 
and the assumption on r„. The second inequality follows from the anti-concentration 
property. We can conclude analogously that 



Pie* < 



< P { S^r) > 4 {pn -e)+al^} +0(1). 



B 



Thus we have that for each e > 

p\e* ^ 



hi hu 

Pn ' Pn 



<A + B + o{l). 



In Step 2 below we show that for each e>{],A + B<a + e + o(l), so that 



< a + o(l). 



This gives us the required conclusion since 9* is an arbitrary sequence of constants within 
the identified set, dependent upon n. 

Step 2. Let [0, oo] be the standard one-point compactification of [0, cxd), endowed 
with the metric d{x,y) = |A(x) — X{y)\, where X{x) = 1 — exp(— x). This space is 
compact, so that every sequence in this space has a convergent subsequence. 

Here we first consider sequences along which 7:„A„ — > c G [0, oo], and show that 

A + B<a + 6 + o{l) if r„A„ ^ c G [0, oo] (A.9) 
Given this, we show that 

A + B<a + £ + o{l) (A.IO) 

holds for every sequence by way of contradiction. Indeed, suppose that A + B > a + e + 6 
for some 6 > along a subsequence. Then we can find a convergent subsequence in [0, oo] 
with respect to d. Thus, we can find a subsequence such that TkAk — c G [0, oo] and 
A + B>a + e + 6 for k large enough, which gives us a contradiction to flA.Qp . 
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We now have to show (1A.9I) . Suppose first that c = in (1A.9I) . then in this case 
Pn = I ~ a/2 + 0(1) and 

A<l-{pn~e) = a/2 + e + o(l), i3 < 1 - (p„ - e) + o(l) = a/2 + e + o(l). 

Suppose that < c < 00 in (lA.QO . then by r„(a{)^^Sj 0, 

Since A„ = Aj^ + A", this imphes that for every subsequence there exists a further 
subsequence indexed by k such that (a) a^A^ ^ 00 or (b) a[A[ ^00. In case (a) we 
get pk = I — $(c)a + 0(1) and 

B < 1 - (p. - = $(c)a + 6 + 0(1), A<P > + 0(1) = 0(1); 

in case (b) we get we get pk = 1 — $(c)q; + o(l) and 

A<l-{pk-e)< <l>{c)a + e + o(l), B<P UUv') > a[^\ + o(l) = o(l). 



So we get for all such subsequences that A + B < a + e + o{l). Given this, we can claim 
that this relation holds for every sequence by the way of contradiction. Indeed, suppose 
that A + B > a + e + 6 for 6 > along a subsequence. But since we can find at least 
one further subsequence along which A + B>a + e + 6 for 6>0 holds and that also 
satisfies either case (a) or (b) above, we obtain a contradiction. □ 



Appendix B. Strong Approximations For Nonparametric Estimators 

B.l. Strong Approximations for Series Estimators. Here we establish strong ap- 
proximations for series estimators of the form considered in section 3.4. 

Theorem 4 (Strong Approximation for a Generic Series Estimator). Let an be a se- 
quence of constants a^, ^ 00. In this paper it suffices to consider an = v^logn. We 
assume the following conditions on a generic series estimation problem, (a) The se- 
ries estimator 6{v) for the function 6{v) has the form 6{v) = p{vy (3, where Pn{v) : = 
[piiv), . . . ,Pk{v)) is a collection of K- dimensional approximating functions such that 
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K — s> oo, and j3 is a K -vector of estimates, (h) The estimator (3 satisfies an asymptoti- 
cally linear representation around some K-dimensional vector (3 : 

n 

n-'/'V^0-(3) = n-V2a-i/2g-i Y^p^iV^e, + r„, ||r„|| = o,{a-'), 

1=1 

{V,,e,) are i.i.d. with E[e,p{V,)] = Q , E[elpn{V,)pn{Vi)'] =: 5„, Q-^Sn{Q-^)' =: ^n, 

where is some non-random invertible matrix, which is not necessarily symmet- 
ric, and eigenvalues of S*^^ are bounded above by Sn- (c) The function 6{y) admits 
the approximation 6{v) = Pn{v)' (3 + ), where the approximation error An{v) satis- 
fies ?.\xp^(,y ^/n\An{v)\/\\gn{v)\\ = o(a-^), gn{v) ■= pn{vy^ll/^. (d) Finally, E[\ei\^] and 
sup^gyinaxj are uniformly bounded in n, and a^sf^K^/n 0. Then we can find 

a random normal vector Mn = N{0,Ik) such that 

\\n;,'/'V^0-(3)-MJ = o,{a-'). 
As a consequence we obtain the following approximation for the series estimator 



sup 



^(eiv) - eiv)) g^ 



\9n{v)\\ 



\9n[.V) 



Remarks. Sufficient conditions for linear approximation (b) are well known in the 
literature on series estimation, e.g. Andrews (1991) and Newey (1995). Conditions 
imposed in (a)-(c) are rather weak. The condition on the boundedness of compo- 
nents pj of the vector p is weak, and is satisfied by B-splines, trigonometric series, 
and a variety of other bases. As shown in the proof, the Condition (b), namely that 
sup^gymaxj |pj(f)| < CO and s\a^K^/n can be replaced by an alternative condi- 
tion, which is sf/'^a^ max^gv Y.j=i \Pj{v)f/n^^'^ 0, which will cover more general 
cases. 

Proof of Theorem 31 The proof has two steps: in the first, we couple the estimator 
y/n{(3 — (3) with the normal vector; in the second, we establish the strong approximation 
for the series estimate of the function. 

Step 1. He we shall apply Yurinskii coupling, see Yurinskii (1977) and Pollard (2002) 
(page 244). 

Let ^1, ...,^„ be independent /^-vectors with E^i = for each i, and A := Xli -^ll^dl'^ 
finite. Let S = C,i + ■■■ + ^n- For each 6 > there exists a random vector T with a 
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A'"(0, var (S)) distribution such that 



P{\\S - T\\ > 36} < CoB (^1 + Ll^S^J^^ where B := AKS-^, 



for some universal constant Cq. 

In order to apply the coupling, consider 



Then we have that 



< •i^^/^maxlf]|p„,(.)|^E 



3/2 



< • max sup \pnj (w) | | | ^ 

~ ''n -"^ ' 

using the assumption that -Eleip and max^sup^^y bnj('f^)| ^-re uniformly bounded in n. 
Therefore, by Yurinskii's coupling, for each S > 



(5nV2) 
by {anfslK''/n ^ 0. 

This proves the first part of the lemma. Also, to justify the remark given after the 
lemma, we have that 
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Therefore, by Yurinskii's coupling, for each 6 > 



-A4 



n 



n I ~ 



K 



Finally by combining the preceding step with the assumption on the linearization 
error r„, we obtain 



^ n ^ n 



Step 2. Using the result of Step 1 and that 



\9n[V) 



\9n[V) 



we conclude that 



\Sn{v)\ :-- 



< 



\\9niv)\\ \\9niv)\\ 



(B.i: 



uniformly in f G V. Finally, 

V^(e{v) - e{v)) gn{v)'Un 



sup 



< sup 



+ sup 



\\9n[v)\\ \\9n\V)\ 

^{e{v) - e{v)) v^grXvyn-'/\p - p) 



\\9n{v)\\ \\gn{v)\\ 
^gn{vyn-''\P - (3) gnivyMn 



\\9n{V}\\ \\9n{V}\ 

= sup \y/nAn{v)/\\gn{vy\\ + sup \Sn{vy = Opia'^) + Op{a~'), 

using the assumption on the approximation error An{v) = 9{v) —pn{v)'(3 and the bound 
dHU). □ 



B.2. Strong Approximations for Kernel- Type Estimators. This section provides 
low-level sufficient conditions for K.l and K.2. In particular, we focus on a case when 
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a bound-generating function 9{-) is estimated by a kernel- type estimator of conditional 
expectation functions. Let Fu\vi'\v) denote the cumulative distribution function of U 
given V — V. 

Theorem 5. Assume that (1) the joint distribution of {U,V) is absolutely continuous 
with respect to Lebesgue measure on [0,1]'^"'"^; (2) fv{v) and are Lipschitz con- 

tinuous and bounded away from zero on their support [0, l]'^; (3) a{v) is continuously 
differentiable and its derivative is bounded; (4) Fj^^y{6\v) is bounded uniformly in {€,v) 
and its partial derivatives with respect to e and v are also uniformly bounded; (5) as 
n — > OO; the kernel estimator of 6{v) has an asymptotic linear expansion: 

where K. is a d-dimensional kernel function with compact support, [—1, 1]'', J 'K{'u)du = 1, 
and is twice continuously differentiable, hn is a sequence of bandwidths that converges to 
zero, and the remainder term satisfies 



(6) Further, assume that 



sup \Rn{v) \ = Op(a„ ); 



nhi a„logn 

oo and ,,,,,,,, U. 



a„(logn)2 nV(rf+i)/i, 

Then there exists a sequence of Gaussian processes Gn{-), indexed by V, with continuous 
sample paths and with 

E[GM]^0 forteV, 
E[Gr.{v^)Gn{v2)] = E[<f>f,^,MV)<l>hr.,vMV)] 
for Vi and f 2 G V, such that 

V^\ GJv) 



sup 



O 



1 ( V — \ 

(nKy/^fviv) g ""^^^^'^ ; h^fviv) 

^^-1 logn)'/' + (nh^)-'/^ logn] a.s. 



Condition (1) assumes that {U,V) are continuous random variables with support 
on the unit cube. There is no loss of generality by restricting the support to be the 
unit cube, provided that the support is known and is a Cartesian product of compact 
connected intervals. The bounded support assumption on U is standard in settings with 
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partial identification. Otherwise, the bound may not exist. Conditions (2)-(4) are mild 
smoothness conditions. Condition (5) provides standard regularity conditions for kernel 
estimation. This holds for kernel mean regression estimators and also local polynomial 
estimators under fairly general conditions. One important restriction that is implicit 
in the asymptotic expansion is that the asymptotic bias is negligible. This could be 
achieved by undersmoothing, which would prevent us from using optimal bandwidths. 
Alternatively, one could use a bias corrected one-sided confidence intervals. Condition 
(6) ensures that 



sup 



Vi 



Gr 



a.s. 



Proof. To prove this theorem, we use Theorem 1.1 of Rio (1994). Define e = Fu\y{U\V). 
For any positive h, define (j)h,s{£, v) = a{v)Fj^ly(e\v)Kd [h~^{s — v)]. For any real num- 
bers a and b satisfying < a < 6 < 1, let )Ca,b be a class of functions 



A^a.fe = {(ph. 



: s G 



h e [a,b]}. 



First, it is standard to show that /C/i/4,/1 is a VC class of functions for each h. Second, 
the UBV (uniformly of bounded variation) and LUBV (locally UBV) conditions of Rio 
(1994) are satisfied. To see this, first note that for some universal constant C < 00, 



dedv < Ch'^-^, 





d4>h,s{e,v) 


d 






de 





where v^^^ is the j-th element of v. Furthermore, as in equation (4.1) of Rio (1994), note 
that for some universal constant C < 00, 

d 




de 



dedv < Ch ^ mm{rih'^, r] 



d ^d+l\ 



where C(r7) is a tube in W^'^^ with edges of length rj. Then Theorem 1.1 of Rio (1994) 
gives the following: 



sup 



y<Ph^Au^.v,)-GAv) 



i=l 



o 



n 



n 



a.s. 



(B.2) 



Since the density of fv{v) is bounded away from zero. Theorem [5] follows immediately 
from (lR2l). □ 
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Appendix C. Implementation 

In this Appendix we describe implementation of our procedure. We begin by detailing 
the steps required for parametrically estimated bound-generating functions, and then 
describe implementation for nonparametric cases. Finally, we describe how one-sided 
bands for upper and lower bounds on 9* can be combined to perform inference on either 
6/ or e*. 

Note that below we focus on the upper bound, but if instead 6^0 were the lower bound 
for 9* , given by the supremum of a bound-generating function, the algorithm would be 
entirely symmetric!^ 

C.l. Parametric Boundary Estimation. We start by considering implementation 
when the bound-generating function is estimated parametrically, i.e. where conditions 
P.l and P.2 hold. We provide a simple approach that relies on simulation from the 
multivariate normal distribution: 

(1) Compute a consistent set estimate V for the minimizing set Vq: 

V = {v -.eiv) < inf e{v) + InCn) 

with = 2-/\ogn ■ sup^gy s{v) and c„ = 1. 

(2) For each v ^ V, compute 'g{v) = 86 (v,^) /d'y ■ where f2 is a consistent 
estimator for the asymptotic variance of v^(7 7o)- 

(3) Simulate a large number R of draws from A/'(0, J/^-), denoted Zi,...,Zji, where 
K = dim(7) and Ik is the identity matrix, and compute k (p) = p-quantile of 
{max^g^ (?(f)'^r/ ||?(^^)||) ,r = 

(4) Compute 9p = min^^y [6' (f ) + k (p) s (v)]. Selecting p = 1/2 provides a median- 
unbiased estimator for 6*0, while selecting p = 1 — a provides a one-sided confi- 
dence interval such that ^(6*0 < 6p) = I — ct- 

An important special case is when the support of v is finite, so that V = {vi, ...vj}. 
In this case, the algorithm above applies where 9{v,'y) = X]j=i7ilb — ^i]' i-^- where 
for each j, 6 {vj, 7) = 7^ and 'g{v) = {l[v = vi] , 1 [f = vj]) ■ fi-*^/^. 

Specifically, the steps below would apply with the following two modifications. First, the set estimate 
Ve in step 1 would be given hy Ve = {v € V : 9 {v) > sup^^y (w) — i!„c„ — e}. Second, one would 
subtract, rather than add, a precision adjustment from the analog estimates for the lower bound in step 
(4), and then compute the maximum after applying this precision-adjustment, i.e. 9p = max^^^y[0 (v) — 
k (p) s (v)]. Note that now k (p) approximates the p-quantile of max^^^[6{v) — 9(v)]/ s(v). However, no 
changes need to made to the computation of k (p) due to the symmetry of the normal distribution. 
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C.2. Nonparametric Boundary Estimation. Here we generalize the previous pro- 
cedure to nonparametric series and kernel boundary estimators. The basic steps are 
the same, though some adjustments are necessary. In particular, the set estimator in 
the first step, will converge to K, which is generally not equal to Vq, but contains Vq 
with probability approaching 1. Setting e = may also be feasible, but this implicitly 
puts more stringent growth restrictions on the number of series terms and bandwidth, 
which may be difficult to verify in practice. 

C.2.1. Series Estimators. In practice, implementation with a series estimator does not 
substantially differ from the parametric case: 

(1) Compute a consistent estimate K for V^: 

K = {t; G V : 9{v) < inf 9{v) + + e} 

vev 

with in = '2^/[ogn ■ sup^gy s{v) and c„ = ^/\ogn. 

(2) For each f G compute ^(f) = Pn (v)' il^^"^, where f2 is a consistent estimate 
of asymptotic variance of (3. 

(3) Simulate a large number R of draws from (0, Ik), denoted Zi, Zr. Compute 
k{p) =p-quantile of {nrax^^^ (^{v)' Zr/ \\9{v)\\) ,r = 1, ...,R}. 

(4) Compute 9p = min^^^;^ [9 (v) + k (p) s (v)]. Selecting p = 1/2 provides a median- 
unbiased estimator for 6*0, while selecting p = 1 — a provides a one-sided confi- 
dence interval such that ^(6^0 < 9p) = 1 — a. 

We can also bypass simulation of the stochastic process by employing expansion (1A.2P 
in the proof of Lemma H] in Appendix A. This choice of k{p) is convenient because it 
does not involve simulation; however, it could be too conservative in some applications. 
Thus, we recommend using simulation in applications, unless the computational cost is 
too high. 

C.2.2. Kernel Estimators. The steps are as follows: 

(1) Compute a consistent estimate K for V^, as given in f l3.14p . e.g. 

V, = {veV: 9{v) < inf 9{v) + £„c„ + e} 

with in = 2v^logn ■ sup^gy s{v) and c„ = y/log n. 

(2) For each f G compute LUn{v) as given in condition K.l, using consistent 
sample analog estimators. 
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(3) Simulate a large number R of draws from N (0, /„), denoted Zi, Zn- Compute 
k{p) = ]9-quantile of {max^^^ [uj^ (v)' Zr/ \\ujn {v)\\) ,r = 1, ...,R}. 

(4) Compute 9p = min^g^J6' (t>) + k (p) s (f)]. Selecting p = 1/2 provides a median- 
unbiased estimator for Oq, while selecting p = 1 — a provides a one-sided confi- 
dence interval such that P{6o < ^p) = 1 — «• 

The researcher also has the option of employing an analytical approximation in place 
of simulation if desired. Such critical values are provided by (IS.lOp . (IS.lip . and (]3.12p . 
all of which are asymptotically equivalent. 
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Figure 1. This figure illustrates how variation in the precision of the 
analog estimator at different points may impede inference. The solid curve 
is the true bound-generating function 9{v), while the dash-dot curve is a 
single realization of its estimator, 9{v). The lighter dashed curves depict 
eight additional representative realizations of the estimator, illustrating 
its precision at different values of v. The minimum of the estimator 6{v) 
is indeed quite far from the minimum of 0{v), making the empirical upper 
bound unduly tight. 
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Figure 2. This figure depicts a precision-corrected curve (dashed curve) 
that adjusts the boundary estimate 9{v) (dotted curve) by an amount pro- 
portional to its point-wise standard error. The minimum of the precision- 
corrected curve is closer to the minimum of the true curve (solid) than the 
minimum of 0{v), removing the downward bias. 
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Figure 3. This figure depicts a precision-corrected curve (dashed curve) 
that adjusts the boundary estimate 9{v) (dotted curve) by an amount 
proportional to its point-wise standard error. The dash-dot curve repre- 
sents an improvement on the precision-corrected curve obtained by em- 
ploying an estimator for the set of minimizing values. The minimum of 
this dash-dotted curve is closer to the minimum of 6{v) than the initial 
precision-corrected curve. 
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Figure 4. This figure provides the estimated upper bound on the log 
wages for college graduates. The minimum of the estimated boundary 
function (dashed curve) occurs in the right-tail of the distribution, where 
the curve is less precisely estimated. The estimate may therefore not 
provide an accurate representation of the true boundary function in this 
region. Our method employs the precision-corrected curve (solid curve) 
to account for varying levels of precision of the estimate. 
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MIV-MTR Lower Bounds (College Graduates): One-Sided C. 
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Figure 5. This figure provides the estimated lower bound on the log 
wages for college graduates. The maximum of the estimated boundary 
function (dashed curve) is in a region where it is relatively precisely es- 
timated. The maximum of the precision-corrected curve (solid curve) is 
therefore quite near the maximum of the estimated curve, though the 
latter is slightly higher. 
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Table 1. Results for Monte Carlo Experiments [1,000 replications per experiment] 
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Notes: The "Analog" and "New" methods refer to the sample analog method and 
our new proposed method. For each method, we report the mean and median biases, 
standard deviation (SD), mean absolute deviation (MAD), root mean squared error 
(RMSE), and empirical coverage probabilities at 50% and 95% levels. 
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Table 2. Descriptive Statistics {n = 2044) 



Variable 


Mean 


Median Std. dev. Minimum 


Maximum 


Log hourly wages 
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2.50 0.58 0.28 
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Table 3. 


Estimation Results 
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